Data Observability for Data Engineering Proactive Strategies for Ensuring Data Accuracy and Addressing Broken Data Pipelines

Discover actionable steps to maintain healthy data pipelines to promote data observability within your teams with this essential guide to elevating data engineering practices Key Features Learn how to monitor your data pipelines in a scalable way Apply real-life use cases and projects to gain hands-...

Full description

Bibliographic Details
Other Authors:	Pinto, Michele, 1975- author (author), Khammal, Sammy El, author
Format:	eBook
Language:	Inglés
Published:	Birmingham, England : Packt Publishing Ltd [2023]
Edition:	First edition
Subjects:	Data mining. Database management. Digital libraries. Semantic Web.
See on Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009790330506719

Table of Contents:

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
Part 1: Introduction to Data Observability
Chapter 1: Fundamentals of Data Quality Monitoring
Learning about the maturity path of data in companies
Identifying information bias in data
Data producers
Data consumers
The relationship between producers and consumers
Asymmetric information among stakeholders
Exploring the seven dimensions of data quality
Accuracy
Completeness
Consistency
Conformity
Integrity
Timeliness
Uniqueness
Consequences of data quality issues
Turning data quality into SLAs
An agreement as a starting point
The incumbent responsibilities of producers
Considerations for SLOs and SLAs
Indicators of data quality
Data source metadata
Schema
Lineage
Application
Statistics and KPIs
Examples of SLAs, SLOs, and SLIs
Alerting on data quality issues
Using indicators to create rules
The data scorecard
Summary
Chapter 2: Fundamentals of Data Observability
Technical requirements
From data quality monitoring to data observability
Three principles of data observability
Data observability in IT observability
Key components of data observability
The contract between the application owner and the marketing team
Observing a timeliness issue
Observing a completeness issue
Observing a change in data distribution
Data observability in the enterprise ecosystem
Measuring the return on investment - defining the goals
Summary
Part 2: Implementing Data Observability
Chapter 3: Data Observability Techniques
Analyzing the data
Monitoring data asynchronously
Monitoring data synchronously
Analyzing the application
The anatomy of an external analyzer
Pros and cons of the application analyzer method.
Advantages
Disadvantages
Principles of monkey patching for data observability
Wrapping the function
Consolidating the findings
Pros and cons of the monkey patching method
Advanced techniques for data observability - distributed tracing
Summary
Chapter 4: Data Observability Elements
Technical requirements
Prerequisites and installation requirements
Kensu - a data observability framework
kensu-py - an overview of the monkey patching technique
Static and dynamic elements
Defining the data observability context
Application or process
Code base
Code version
Project
Environment
User
Timestamp
The application run
Getting the metadata of the data sources
Data source
Schema
Mastering lineage
Types of lineage and dependencies
Lineage run
What's in the log?
Computing observability metrics
What's in the log?
Data observability for AI models
Model method
Model training
Model metrics
What's in the log?
The feedback loop in data observability
Summary
Chapter 5: Defining Rules on Indicators
Technical requirements
Determining SLOs
Project versus data source SLOs
Use case
Turning SLOs into rules
Different types of rules
Implementation of the rules
Project - continuous validation of the data
Concepts of CI/CD
Deploying the rules in a CI/CD pipeline
Summary
Part 3: How to adopt Data Observability in your organization
Chapter 6: Root Cause Analysis
Data incident management
Detecting the issue
Impact analysis
Root cause analysis
Troubleshooting
Preventing further issues
Applying the method - a practical example
Anomaly detection
Simple indicator deterministic cases
Multiple indicators deterministic cases
Time series analysis
Case study
Summary.
Chapter 7: Optimizing Data Pipelines
Concepts of data pipelines and data architecture
What is a data pipeline?
Defining the types of data pipelines
The properties of a data pipeline
Rationalizing the costs
Data pipeline costs
Using data observability to rationalize costs
Summary
Chapter 8: Organizing Data Teams and Measuring the Success of Data Observability
Defining and understanding data teams
The roles of a data team
Organizing a data team
Data mesh, data quality, and data observability - a virtuous circle
Data mesh
Building the virtuous circle
The first steps toward data observability and how to measure success
Measuring success
Summary
Part 4: Appendix
Chapter 9: Data Observability Checklist
Challenges of implementing data observability
Costs
Overhead
Security
Complexity increase
Legacy system
Information overload
Checklist to implement data observability
Start with the right data or application
Choosing the right data observability tool
Selecting the metrics to follow
Compute the return on investment
Scaling with data observability
Summary
Chapter 10: Pathway to Data Observability
Technical roadmap to include data observability
Allocating the right resources to your data observability project
Defining clear objectives with the team
Choosing a data pipeline
Setting success criteria with the team and stakeholders
Implementing data observability in applications
Continuously improving observability
Scaling data observability
Using observability for data catalogs
Using observability to ensure ML and AI reliability
Using observability to complete a data quality management program
Implementing data observability in a project
Resources and the first pipeline
Success criteria for PetCie's implementation.
The implementation phase at PetCie
Continuously improving observability at PetCie
Deploying observability at scale at PetCie
Outcomes
Summary
Index
Other Books You May Enjoy.

Data Observability for Data Engineering Proactive Strategies for Ensuring Data Accuracy and Addressing Broken Data Pipelines

Similar Items