Data Engineering Best Practices Architect Robust and Cost-Effective Data Solutions in the Cloud Era

Explore modern data engineering techniques and best practices to build scalable, efficient, and future-proof data processing systems across cloud platforms Key Features Architect and engineer optimized data solutions in the cloud with best practices for performance and cost-effectiveness Explore des...

Descripción completa

Detalles Bibliográficos
Otros Autores: Schiller, Richard J., author (author), LaRochelle, David, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing [2024]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009853634306719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Overview of the Business Problem Statement
  • What is the business problem statement?
  • Anti-patterns to avoid
  • Patterns in the future-proof architecture
  • Future-proofing is …
  • Organization into zone considerations
  • Cloud limitations
  • The Intelligence Age
  • Use case definitions
  • The mission, the vision, and the strategy
  • Principles and the development life cycle
  • The architecture definition, best practices, and key considerations
  • The DataOps convergence
  • Summary
  • Chapter 2: A Data Engineer's Journey - Background Challenges
  • Challenge #1 - platform architectures change rapidly
  • Platform architectures in the 21st century
  • Impacts on business strategy
  • A flexible software development life cycle to manage platform risk
  • Challenge #2 - Total cost of ownership (TCO) is high
  • ETL architecture costs are high!
  • Buy versus build choices impact a solution's longevity
  • Challenge #3 - Evolving data repository patterns - identifying big rocks for data engineers
  • Intake, integration, and storage challenges in data engineering
  • Identifying the big rocks to be placed first into your design
  • Being able to handle technology hype
  • Summary
  • Chapter 3: A Data Engineer's Journey - IT's Vision and Mission
  • The vision
  • Develop the IT engineering vision
  • Vision summary
  • The mission and the IT strategy
  • IT's vision
  • IT's mission
  • IT mission summary
  • Principles, frameworks, and best practices
  • The architecture reflects the vision
  • Principles summary
  • Data engineering patterns for IT operability
  • What patterns are required and how are they specified?
  • Pattern summary
  • Summary
  • Chapter 4: Architecture Principles
  • Architecture principles overview
  • Architecture foundation.
  • Data lake, mesh, and fabric
  • Data immutability
  • Third party tool, cloud platform-as-a-service (PaaS), and framework integrations
  • Data mesh principles
  • Data mesh metadata
  • Data semantics in the data mesh
  • Data mesh, security, and tech stack considerations
  • What are the key foundational takeaways?
  • Architecture principles in depth
  • Principle #1 - Data lake as a centerpiece? No, implement the data journey!
  • Principle #2 - A data lake's immutable data is to remain explorable
  • Principle #3 - A data lake's immutable data remains available for analytics
  • Principle #4 - A data lake's sources are discoverable
  • Principle #5 - A data lake's tooling should be consistent with the architecture
  • Principle #6 - A data mesh defines data to be governed by domain-driven ownership
  • Principle #7 - A data mesh defines the data and derives insights as a product
  • Principle #8 - A data mesh defines data, information, and insights to be self-service
  • Principle #9 - A data mesh implements a federated governance processing system
  • Principle #10 - Metadata is associated with datasets and is relevant to the business
  • Principle #11 - Dataset lineage and at-rest metadata is subject to life cycle governance
  • Principle #12 - Datasets and metadata require cataloging and discovery services
  • Principle #13 - Semantic metadata guarantees correct business understanding at all stages in the data journey
  • Principle #14 - Data big rock architecture choices (time series, correction processing, security, privacy, and so on) are to be handled in the design early
  • Principle #15 - Implement foundational capabilities in the architecture framework first
  • Summary
  • Chapter 5: Architecture Framework - Conceptual Architecture Best Practices
  • Conceptual architecture overview
  • Best practice organization.
  • How does the conceptual architecture align with the logical architecture and physical architecture?
  • Conceptual architecture best practices
  • Conceptual architecture description
  • Conceptual architecture glossary
  • What are the data architecture's key issues identified in the conceptual architecture?
  • Best practice composition of the conceptual architecture
  • Conceptual to logical architecture mapping
  • Summary
  • Chapter 6: Architecture Framework - Logical Architecture Best Practices
  • Logical architecture overview
  • Organizing best practices
  • How does the logical architecture align with the conceptual and physical architecture?
  • Detailed capabilities of the ingestion zones
  • ETL data pipelines
  • Bronze standard datasets
  • Detailed capabilities of the transformation zones
  • Data quality features
  • Data lake house and warehouse
  • Gold and silver standard datasets
  • Detailed capabilities of the consumption zones
  • Data analytics
  • Accessing silver standard datasets from the consumption zone
  • Trade-offs between public cloud, on-premises, and multi-cloud
  • Cost of ingest or egress for cloud data
  • Cost of a dedicated network line to the point of service
  • Cost of provisioning
  • Cost of monitoring and observability
  • Hybrid or multi-cloud choices!
  • The benefits of a multi-cloud strategy
  • Summary
  • Chapter 7: Architecture Framework - Physical Architecture Best Practices
  • Physical architecture overview
  • Best practice organization
  • How does the physical architecture align with the logical and conceptual architecture?
  • How should the physical architecture align with the operational processes/capabilities of the solution?
  • Examples of physical reference architectures
  • Summary
  • Chapter 8: Software Engineering Best Practice Considerations
  • SBP 1 - follow the architecture!.
  • The core value of architectural integrity
  • The downstream impact of deviating
  • Ensuring adherence in your data engineering team
  • Continuous evolution and architecture
  • Conclusion
  • SBP 2 - implement Agile methodology for your organization!
  • Introduction to Agile methodology
  • Agile principles and their significance in data engineering
  • Benefits of implementing Agile in data engineering
  • Challenges and considerations in Agile data engineering
  • Steps to implement Agile in data engineering
  • Tools and Agile practices tailored for data engineering
  • Conclusion
  • SBP 3 - generate objectives and key results (OKRs)!
  • Introduction and deep dive into OKRs
  • Crafting data-centric OKRs
  • Potential challenges with OKRs in data engineering
  • Reviewing and iterating on OKRs in a data context
  • SBP 4 - implement data as a product!
  • SBP 5 - implement shift left testing (SLT) processes!
  • Understanding SLT
  • Benefits of SLT in data engineering
  • Implementing shift left testing
  • Specific shift left testing strategies for data engineering
  • Challenges in shift left testing for data engineering
  • Tools and technologies to facilitate shift left in data engineering
  • Synergy with other data best practices
  • SBP 6 - implement the difficult first!
  • The philosophy of tackling the hard tasks first
  • How data engineers can prioritize difficult tasks
  • Implementing difficult data tasks
  • Synergy with other data best practices
  • Conclusion
  • SBP 7 - avoid premature optimization
  • The true cost of premature optimization
  • Recognizing and avoiding the trap in data engineering
  • Balancing performance needs and over-optimization in data engineering
  • Synergy with other data best practices
  • SBP 8 - automate cloud code snippet deployments with standard deployment scripted wrappers
  • The importance of deployment automation.
  • The deployment model choices
  • Benefits of using scripted deployment wrappers
  • Version control - ensuring consistency and traceability
  • Relevance to data engineering in cloud environments
  • Practical implementation steps
  • Challenges and precautions
  • Synergy with other software and data best practices
  • SBP 9 - define and implement NFRs first
  • Distinguishing functional (FRs) from non-functional requirements (NFRs)
  • Relevance to data engineering
  • Key NFRs in cloud data engineering
  • Defining and implementing NFRs
  • Risks of neglecting early implementation of NFRs
  • SBP 10 - implement data journey journaling to facilitate future problem resolution
  • Relevance to data engineering
  • Challenges and considerations
  • SBP 11 - implement data journey pipelines that are experimental first!
  • Enabling data pipeline experimentation as datasets are readied
  • Releasing data like code
  • Challenges and considerations
  • SBP 12 - choose languages with solid reasoning
  • Key languages in data engineering and their roles
  • The pressures and limitations imposed by PaaS offerings
  • Pitfalls to avoid
  • SBP 13 - drive scripting and PaaS code with parameterization using a secure configuration management repository tool
  • The power of parameterization and configuration management
  • The growth of configuration complexity
  • Why parameterize?
  • Configuration management repositories and configuration management databases (CMDBs)
  • Best practices for secure configuration management
  • SBP 14 - be prepared to prune dead code over time
  • The accumulation of dead code in software and PaaS systems
  • The unique challenge of PaaS service configurations
  • Pruning dead code
  • SBP 15 - if it doesn't fit, don't force it
  • use a microservice
  • PaaS and its boundaries
  • Microservices as a contingency strategy.
  • Challenges and considerations of this dual approach.