Data Observability for Data Engineering Proactive Strategies for Ensuring Data Accuracy and Addressing Broken Data Pipelines

Discover actionable steps to maintain healthy data pipelines to promote data observability within your teams with this essential guide to elevating data engineering practices Key Features Learn how to monitor your data pipelines in a scalable way Apply real-life use cases and projects to gain hands-...

Descripción completa

Detalles Bibliográficos
Otros Autores: Pinto, Michele, 1975- author (author), Khammal, Sammy El, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing Ltd [2023]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009790330506719
Tabla de Contenidos:
  • Cover
  • Title Page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Introduction to Data Observability
  • Chapter 1: Fundamentals of Data Quality Monitoring
  • Learning about the maturity path of data in companies
  • Identifying information bias in data
  • Data producers
  • Data consumers
  • The relationship between producers and consumers
  • Asymmetric information among stakeholders
  • Exploring the seven dimensions of data quality
  • Accuracy
  • Completeness
  • Consistency
  • Conformity
  • Integrity
  • Timeliness
  • Uniqueness
  • Consequences of data quality issues
  • Turning data quality into SLAs
  • An agreement as a starting point
  • The incumbent responsibilities of producers
  • Considerations for SLOs and SLAs
  • Indicators of data quality
  • Data source metadata
  • Schema
  • Lineage
  • Application
  • Statistics and KPIs
  • Examples of SLAs, SLOs, and SLIs
  • Alerting on data quality issues
  • Using indicators to create rules
  • The data scorecard
  • Summary
  • Chapter 2: Fundamentals of Data Observability
  • Technical requirements
  • From data quality monitoring to data observability
  • Three principles of data observability
  • Data observability in IT observability
  • Key components of data observability
  • The contract between the application owner and the marketing team
  • Observing a timeliness issue
  • Observing a completeness issue
  • Observing a change in data distribution
  • Data observability in the enterprise ecosystem
  • Measuring the return on investment - defining the goals
  • Summary
  • Part 2: Implementing Data Observability
  • Chapter 3: Data Observability Techniques
  • Analyzing the data
  • Monitoring data asynchronously
  • Monitoring data synchronously
  • Analyzing the application
  • The anatomy of an external analyzer
  • Pros and cons of the application analyzer method.
  • Advantages
  • Disadvantages
  • Principles of monkey patching for data observability
  • Wrapping the function
  • Consolidating the findings
  • Pros and cons of the monkey patching method
  • Advanced techniques for data observability - distributed tracing
  • Summary
  • Chapter 4: Data Observability Elements
  • Technical requirements
  • Prerequisites and installation requirements
  • Kensu - a data observability framework
  • kensu-py - an overview of the monkey patching technique
  • Static and dynamic elements
  • Defining the data observability context
  • Application or process
  • Code base
  • Code version
  • Project
  • Environment
  • User
  • Timestamp
  • The application run
  • Getting the metadata of the data sources
  • Data source
  • Schema
  • Mastering lineage
  • Types of lineage and dependencies
  • Lineage run
  • What's in the log?
  • Computing observability metrics
  • What's in the log?
  • Data observability for AI models
  • Model method
  • Model training
  • Model metrics
  • What's in the log?
  • The feedback loop in data observability
  • Summary
  • Chapter 5: Defining Rules on Indicators
  • Technical requirements
  • Determining SLOs
  • Project versus data source SLOs
  • Use case
  • Turning SLOs into rules
  • Different types of rules
  • Implementation of the rules
  • Project - continuous validation of the data
  • Concepts of CI/CD
  • Deploying the rules in a CI/CD pipeline
  • Summary
  • Part 3: How to adopt Data Observability in your organization
  • Chapter 6: Root Cause Analysis
  • Data incident management
  • Detecting the issue
  • Impact analysis
  • Root cause analysis
  • Troubleshooting
  • Preventing further issues
  • Applying the method - a practical example
  • Anomaly detection
  • Simple indicator deterministic cases
  • Multiple indicators deterministic cases
  • Time series analysis
  • Case study
  • Summary.
  • Chapter 7: Optimizing Data Pipelines
  • Concepts of data pipelines and data architecture
  • What is a data pipeline?
  • Defining the types of data pipelines
  • The properties of a data pipeline
  • Rationalizing the costs
  • Data pipeline costs
  • Using data observability to rationalize costs
  • Summary
  • Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability
  • Defining and understanding data teams
  • The roles of a data team
  • Organizing a data team
  • Data mesh, data quality, and data observability - a virtuous circle
  • Data mesh
  • Building the virtuous circle
  • The first steps toward data observability and how to measure success
  • Measuring success
  • Summary
  • Part 4: Appendix
  • Chapter 9: Data Observability Checklist
  • Challenges of implementing data observability
  • Costs
  • Overhead
  • Security
  • Complexity increase
  • Legacy system
  • Information overload
  • Checklist to implement data observability
  • Start with the right data or application
  • Choosing the right data observability tool
  • Selecting the metrics to follow
  • Compute the return on investment
  • Scaling with data observability
  • Summary
  • Chapter 10: Pathway to Data Observability
  • Technical roadmap to include data observability
  • Allocating the right resources to your data observability project
  • Defining clear objectives with the team
  • Choosing a data pipeline
  • Setting success criteria with the team and stakeholders
  • Implementing data observability in applications
  • Continuously improving observability
  • Scaling data observability
  • Using observability for data catalogs
  • Using observability to ensure ML and AI reliability
  • Using observability to complete a data quality management program
  • Implementing data observability in a project
  • Resources and the first pipeline
  • Success criteria for PetCie's implementation.
  • The implementation phase at PetCie
  • Continuously improving observability at PetCie
  • Deploying observability at scale at PetCie
  • Outcomes
  • Summary
  • Index
  • Other Books You May Enjoy.