Data Engineering Best Practices Architect Robust and Cost-Effective Data Solutions in the Cloud Era
Explore modern data engineering techniques and best practices to build scalable, efficient, and future-proof data processing systems across cloud platforms Key Features Architect and engineer optimized data solutions in the cloud with best practices for performance and cost-effectiveness Explore des...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2024]
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009853634306719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- Chapter 1: Overview of the Business Problem Statement
- What is the business problem statement?
- Anti-patterns to avoid
- Patterns in the future-proof architecture
- Future-proofing is …
- Organization into zone considerations
- Cloud limitations
- The Intelligence Age
- Use case definitions
- The mission, the vision, and the strategy
- Principles and the development life cycle
- The architecture definition, best practices, and key considerations
- The DataOps convergence
- Summary
- Chapter 2: A Data Engineer's Journey - Background Challenges
- Challenge #1 - platform architectures change rapidly
- Platform architectures in the 21st century
- Impacts on business strategy
- A flexible software development life cycle to manage platform risk
- Challenge #2 - Total cost of ownership (TCO) is high
- ETL architecture costs are high!
- Buy versus build choices impact a solution's longevity
- Challenge #3 - Evolving data repository patterns - identifying big rocks for data engineers
- Intake, integration, and storage challenges in data engineering
- Identifying the big rocks to be placed first into your design
- Being able to handle technology hype
- Summary
- Chapter 3: A Data Engineer's Journey - IT's Vision and Mission
- The vision
- Develop the IT engineering vision
- Vision summary
- The mission and the IT strategy
- IT's vision
- IT's mission
- IT mission summary
- Principles, frameworks, and best practices
- The architecture reflects the vision
- Principles summary
- Data engineering patterns for IT operability
- What patterns are required and how are they specified?
- Pattern summary
- Summary
- Chapter 4: Architecture Principles
- Architecture principles overview
- Architecture foundation.
- Data lake, mesh, and fabric
- Data immutability
- Third party tool, cloud platform-as-a-service (PaaS), and framework integrations
- Data mesh principles
- Data mesh metadata
- Data semantics in the data mesh
- Data mesh, security, and tech stack considerations
- What are the key foundational takeaways?
- Architecture principles in depth
- Principle #1 - Data lake as a centerpiece? No, implement the data journey!
- Principle #2 - A data lake's immutable data is to remain explorable
- Principle #3 - A data lake's immutable data remains available for analytics
- Principle #4 - A data lake's sources are discoverable
- Principle #5 - A data lake's tooling should be consistent with the architecture
- Principle #6 - A data mesh defines data to be governed by domain-driven ownership
- Principle #7 - A data mesh defines the data and derives insights as a product
- Principle #8 - A data mesh defines data, information, and insights to be self-service
- Principle #9 - A data mesh implements a federated governance processing system
- Principle #10 - Metadata is associated with datasets and is relevant to the business
- Principle #11 - Dataset lineage and at-rest metadata is subject to life cycle governance
- Principle #12 - Datasets and metadata require cataloging and discovery services
- Principle #13 - Semantic metadata guarantees correct business understanding at all stages in the data journey
- Principle #14 - Data big rock architecture choices (time series, correction processing, security, privacy, and so on) are to be handled in the design early
- Principle #15 - Implement foundational capabilities in the architecture framework first
- Summary
- Chapter 5: Architecture Framework - Conceptual Architecture Best Practices
- Conceptual architecture overview
- Best practice organization.
- How does the conceptual architecture align with the logical architecture and physical architecture?
- Conceptual architecture best practices
- Conceptual architecture description
- Conceptual architecture glossary
- What are the data architecture's key issues identified in the conceptual architecture?
- Best practice composition of the conceptual architecture
- Conceptual to logical architecture mapping
- Summary
- Chapter 6: Architecture Framework - Logical Architecture Best Practices
- Logical architecture overview
- Organizing best practices
- How does the logical architecture align with the conceptual and physical architecture?
- Detailed capabilities of the ingestion zones
- ETL data pipelines
- Bronze standard datasets
- Detailed capabilities of the transformation zones
- Data quality features
- Data lake house and warehouse
- Gold and silver standard datasets
- Detailed capabilities of the consumption zones
- Data analytics
- Accessing silver standard datasets from the consumption zone
- Trade-offs between public cloud, on-premises, and multi-cloud
- Cost of ingest or egress for cloud data
- Cost of a dedicated network line to the point of service
- Cost of provisioning
- Cost of monitoring and observability
- Hybrid or multi-cloud choices!
- The benefits of a multi-cloud strategy
- Summary
- Chapter 7: Architecture Framework - Physical Architecture Best Practices
- Physical architecture overview
- Best practice organization
- How does the physical architecture align with the logical and conceptual architecture?
- How should the physical architecture align with the operational processes/capabilities of the solution?
- Examples of physical reference architectures
- Summary
- Chapter 8: Software Engineering Best Practice Considerations
- SBP 1 - follow the architecture!.
- The core value of architectural integrity
- The downstream impact of deviating
- Ensuring adherence in your data engineering team
- Continuous evolution and architecture
- Conclusion
- SBP 2 - implement Agile methodology for your organization!
- Introduction to Agile methodology
- Agile principles and their significance in data engineering
- Benefits of implementing Agile in data engineering
- Challenges and considerations in Agile data engineering
- Steps to implement Agile in data engineering
- Tools and Agile practices tailored for data engineering
- Conclusion
- SBP 3 - generate objectives and key results (OKRs)!
- Introduction and deep dive into OKRs
- Crafting data-centric OKRs
- Potential challenges with OKRs in data engineering
- Reviewing and iterating on OKRs in a data context
- SBP 4 - implement data as a product!
- SBP 5 - implement shift left testing (SLT) processes!
- Understanding SLT
- Benefits of SLT in data engineering
- Implementing shift left testing
- Specific shift left testing strategies for data engineering
- Challenges in shift left testing for data engineering
- Tools and technologies to facilitate shift left in data engineering
- Synergy with other data best practices
- SBP 6 - implement the difficult first!
- The philosophy of tackling the hard tasks first
- How data engineers can prioritize difficult tasks
- Implementing difficult data tasks
- Synergy with other data best practices
- Conclusion
- SBP 7 - avoid premature optimization
- The true cost of premature optimization
- Recognizing and avoiding the trap in data engineering
- Balancing performance needs and over-optimization in data engineering
- Synergy with other data best practices
- SBP 8 - automate cloud code snippet deployments with standard deployment scripted wrappers
- The importance of deployment automation.
- The deployment model choices
- Benefits of using scripted deployment wrappers
- Version control - ensuring consistency and traceability
- Relevance to data engineering in cloud environments
- Practical implementation steps
- Challenges and precautions
- Synergy with other software and data best practices
- SBP 9 - define and implement NFRs first
- Distinguishing functional (FRs) from non-functional requirements (NFRs)
- Relevance to data engineering
- Key NFRs in cloud data engineering
- Defining and implementing NFRs
- Risks of neglecting early implementation of NFRs
- SBP 10 - implement data journey journaling to facilitate future problem resolution
- Relevance to data engineering
- Challenges and considerations
- SBP 11 - implement data journey pipelines that are experimental first!
- Enabling data pipeline experimentation as datasets are readied
- Releasing data like code
- Challenges and considerations
- SBP 12 - choose languages with solid reasoning
- Key languages in data engineering and their roles
- The pressures and limitations imposed by PaaS offerings
- Pitfalls to avoid
- SBP 13 - drive scripting and PaaS code with parameterization using a secure configuration management repository tool
- The power of parameterization and configuration management
- The growth of configuration complexity
- Why parameterize?
- Configuration management repositories and configuration management databases (CMDBs)
- Best practices for secure configuration management
- SBP 14 - be prepared to prune dead code over time
- The accumulation of dead code in software and PaaS systems
- The unique challenge of PaaS service configurations
- Pruning dead code
- SBP 15 - if it doesn't fit, don't force it
- use a microservice
- PaaS and its boundaries
- Microservices as a contingency strategy.
- Challenges and considerations of this dual approach.