Data Engineering Best Practices Architect Robust and Cost-Effective Data Solutions in the Cloud Era

Explore modern data engineering techniques and best practices to build scalable, efficient, and future-proof data processing systems across cloud platforms Key Features Architect and engineer optimized data solutions in the cloud with best practices for performance and cost-effectiveness Explore des...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Schiller, Richard J., author (author), LaRochelle, David, author
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Birmingham, England : Packt Publishing [2024]
Edición:	First edition
Materias:	Database management. Big data > Processing. Cloud computing. Agile software development.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009853634306719

Tabla de Contenidos:

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Chapter 1: Overview of the Business Problem Statement
What is the business problem statement?
Anti-patterns to avoid
Patterns in the future-proof architecture
Future-proofing is …
Organization into zone considerations
Cloud limitations
The Intelligence Age
Use case definitions
The mission, the vision, and the strategy
Principles and the development life cycle
The architecture definition, best practices, and key considerations
The DataOps convergence
Summary
Chapter 2: A Data Engineer's Journey - Background Challenges
Challenge #1 - platform architectures change rapidly
Platform architectures in the 21st century
Impacts on business strategy
A flexible software development life cycle to manage platform risk
Challenge #2 - Total cost of ownership (TCO) is high
ETL architecture costs are high!
Buy versus build choices impact a solution's longevity
Challenge #3 - Evolving data repository patterns - identifying big rocks for data engineers
Intake, integration, and storage challenges in data engineering
Identifying the big rocks to be placed first into your design
Being able to handle technology hype
Summary
Chapter 3: A Data Engineer's Journey - IT's Vision and Mission
The vision
Develop the IT engineering vision
Vision summary
The mission and the IT strategy
IT's vision
IT's mission
IT mission summary
Principles, frameworks, and best practices
The architecture reflects the vision
Principles summary
Data engineering patterns for IT operability
What patterns are required and how are they specified?
Pattern summary
Summary
Chapter 4: Architecture Principles
Architecture principles overview
Architecture foundation.
Data lake, mesh, and fabric
Data immutability
Third party tool, cloud platform-as-a-service (PaaS), and framework integrations
Data mesh principles
Data mesh metadata
Data semantics in the data mesh
Data mesh, security, and tech stack considerations
What are the key foundational takeaways?
Architecture principles in depth
Principle #1 - Data lake as a centerpiece? No, implement the data journey!
Principle #2 - A data lake's immutable data is to remain explorable
Principle #3 - A data lake's immutable data remains available for analytics
Principle #4 - A data lake's sources are discoverable
Principle #5 - A data lake's tooling should be consistent with the architecture
Principle #6 - A data mesh defines data to be governed by domain-driven ownership
Principle #7 - A data mesh defines the data and derives insights as a product
Principle #8 - A data mesh defines data, information, and insights to be self-service
Principle #9 - A data mesh implements a federated governance processing system
Principle #10 - Metadata is associated with datasets and is relevant to the business
Principle #11 - Dataset lineage and at-rest metadata is subject to life cycle governance
Principle #12 - Datasets and metadata require cataloging and discovery services
Principle #13 - Semantic metadata guarantees correct business understanding at all stages in the data journey
Principle #14 - Data big rock architecture choices (time series, correction processing, security, privacy, and so on) are to be handled in the design early
Principle #15 - Implement foundational capabilities in the architecture framework first
Summary
Chapter 5: Architecture Framework - Conceptual Architecture Best Practices
Conceptual architecture overview
Best practice organization.
How does the conceptual architecture align with the logical architecture and physical architecture?
Conceptual architecture best practices
Conceptual architecture description
Conceptual architecture glossary
What are the data architecture's key issues identified in the conceptual architecture?
Best practice composition of the conceptual architecture
Conceptual to logical architecture mapping
Summary
Chapter 6: Architecture Framework - Logical Architecture Best Practices
Logical architecture overview
Organizing best practices
How does the logical architecture align with the conceptual and physical architecture?
Detailed capabilities of the ingestion zones
ETL data pipelines
Bronze standard datasets
Detailed capabilities of the transformation zones
Data quality features
Data lake house and warehouse
Gold and silver standard datasets
Detailed capabilities of the consumption zones
Data analytics
Accessing silver standard datasets from the consumption zone
Trade-offs between public cloud, on-premises, and multi-cloud
Cost of ingest or egress for cloud data
Cost of a dedicated network line to the point of service
Cost of provisioning
Cost of monitoring and observability
Hybrid or multi-cloud choices!
The benefits of a multi-cloud strategy
Summary
Chapter 7: Architecture Framework - Physical Architecture Best Practices
Physical architecture overview
Best practice organization
How does the physical architecture align with the logical and conceptual architecture?
How should the physical architecture align with the operational processes/capabilities of the solution?
Examples of physical reference architectures
Summary
Chapter 8: Software Engineering Best Practice Considerations
SBP 1 - follow the architecture!.
The core value of architectural integrity
The downstream impact of deviating
Ensuring adherence in your data engineering team
Continuous evolution and architecture
Conclusion
SBP 2 - implement Agile methodology for your organization!
Introduction to Agile methodology
Agile principles and their significance in data engineering
Benefits of implementing Agile in data engineering
Challenges and considerations in Agile data engineering
Steps to implement Agile in data engineering
Tools and Agile practices tailored for data engineering
Conclusion
SBP 3 - generate objectives and key results (OKRs)!
Introduction and deep dive into OKRs
Crafting data-centric OKRs
Potential challenges with OKRs in data engineering
Reviewing and iterating on OKRs in a data context
SBP 4 - implement data as a product!
SBP 5 - implement shift left testing (SLT) processes!
Understanding SLT
Benefits of SLT in data engineering
Implementing shift left testing
Specific shift left testing strategies for data engineering
Challenges in shift left testing for data engineering
Tools and technologies to facilitate shift left in data engineering
Synergy with other data best practices
SBP 6 - implement the difficult first!
The philosophy of tackling the hard tasks first
How data engineers can prioritize difficult tasks
Implementing difficult data tasks
Synergy with other data best practices
Conclusion
SBP 7 - avoid premature optimization
The true cost of premature optimization
Recognizing and avoiding the trap in data engineering
Balancing performance needs and over-optimization in data engineering
Synergy with other data best practices
SBP 8 - automate cloud code snippet deployments with standard deployment scripted wrappers
The importance of deployment automation.
The deployment model choices
Benefits of using scripted deployment wrappers
Version control - ensuring consistency and traceability
Relevance to data engineering in cloud environments
Practical implementation steps
Challenges and precautions
Synergy with other software and data best practices
SBP 9 - define and implement NFRs first
Distinguishing functional (FRs) from non-functional requirements (NFRs)
Relevance to data engineering
Key NFRs in cloud data engineering
Defining and implementing NFRs
Risks of neglecting early implementation of NFRs
SBP 10 - implement data journey journaling to facilitate future problem resolution
Relevance to data engineering
Challenges and considerations
SBP 11 - implement data journey pipelines that are experimental first!
Enabling data pipeline experimentation as datasets are readied
Releasing data like code
Challenges and considerations
SBP 12 - choose languages with solid reasoning
Key languages in data engineering and their roles
The pressures and limitations imposed by PaaS offerings
Pitfalls to avoid
SBP 13 - drive scripting and PaaS code with parameterization using a secure configuration management repository tool
The power of parameterization and configuration management
The growth of configuration complexity
Why parameterize?
Configuration management repositories and configuration management databases (CMDBs)
Best practices for secure configuration management
SBP 14 - be prepared to prune dead code over time
The accumulation of dead code in software and PaaS systems
The unique challenge of PaaS service configurations
Pruning dead code
SBP 15 - if it doesn't fit, don't force it
use a microservice
PaaS and its boundaries
Microservices as a contingency strategy.
Challenges and considerations of this dual approach.

Data Engineering Best Practices Architect Robust and Cost-Effective Data Solutions in the Cloud Era

Ejemplares similares