Data engineering on Azure

Bibliographic Details
Other Authors:	Riscutia, Vlad, author (author)
Format:	eBook
Language:	Inglés
Published:	Shelter Island, New York : Manning [2021]
Subjects:	Microsoft Azure SQL Database. Cloud computing. Database management.
See on Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009634690206719

Table of Contents:

Intro
inside front cover
Data Platform Architecture
Data Engineering on Azure
Copyright
dedication
brief contents
contents
front matter
preface
acknowledgments
about this book
about the author
about the cover illustration
1 Introduction
1.1 What is data engineering?
1.2 Who this book is for
1.3 What is a data platform?
1.3.1 Anatomy of a data platform
1.3.2 Infrastructure as code, codeless infrastructure
1.4 Building in the cloud
1.4.1 IaaS, PaaS, SaaS
1.4.2 Network, storage, compute
1.4.3 Getting started with Azure
1.4.4 Interacting with Azure
1.5 Implementing an Azure data platform
Summary
Part 1 Infrastructure
2 Storage
2.1 Storing data in a data platform
2.1.1 Storing data across multiple data fabrics
2.1.2 Having a single source of truth
2.2 Introducing Azure Data Explorer
2.2.1 Deploying an Azure Data Explorer cluster
2.2.2 Using Azure Data Explorer
2.2.3 Working around query limits
2.3 Introducing Azure Data Lake Storage
2.3.1 Creating an Azure Data Lake Storage account
2.3.2 Using Azure Data Lake Storage
2.3.3 Integrating with Azure Data Explorer
2.4 Ingesting data
2.4.1 Ingestion frequency
2.4.2 Load type
2.4.3 Restatements and reloads
Summary
3 DevOps
3.1 What is DevOps?
3.1.1 DevOps in data engineering
3.2 Introducing Azure DevOps
3.2.1 Using the az azure-devops extension
3.3 Deploying infrastructure
3.3.1 Exporting an Azure Resource Manager template
3.3.2 Creating Azure DevOps service connections
3.3.3 Deploying Azure Resource Manager templates
3.3.4 Understanding Azure Pipelines
3.4 Deploying analytics
3.4.1 Using Azure DevOps marketplace extensions
3.4.2 Storing everything in Git
deploying everything automatically
Summary
4 Orchestration.
4.1 Ingesting the Bing COVID-19 open dataset
4.2 Introducing Azure Data Factory
4.2.1 Setting up the data source
4.2.2 Setting up the data sink
4.2.3 Setting up the pipeline
4.2.4 Setting up a trigger
4.2.5 Orchestrating with Azure Data Factory
4.3 DevOps for Azure Data Factory
4.3.1 Deploying Azure Data Factory from Git
4.3.2 Setting up access control
4.3.3 Deploying the production data factory
4.3.4 DevOps for the Azure Data Factory recap
4.4 Monitoring with Azure Monitor
Summary
Part 2 Workloads
5 Processing
5.1 Data modeling techniques
5.1.1 Normalization and denormalization
5.1.2 Data warehousing
5.1.3 Semistructured data
5.1.4 Data modeling recap
5.2 Identity keyrings
5.2.1 Building an identity keyring
5.2.2 Understanding keyrings
5.3 Timelines
5.3.1 Building a timeline view
5.3.2 Using timelines
5.4 Continuous data processing
5.4.1 Tracking processing functions in Git
5.4.2 Keyring building in Azure Data Factory
5.4.3 Scaling out
Summary
6 Analytics
6.1 Structuring storage
6.1.1 Providing development data
6.1.2 Replicating production data
6.1.3 Providing read-only access to the production data
6.1.4 Storage structure recap
6.2 Analytics workflow
6.2.1 Prototyping
6.2.2 Development and user acceptance testing
6.2.3 Production
6.2.4 Analytics workflow recap
6.3 Self-serve data movement
6.3.1 Support model
6.3.2 Data contracts
6.3.3 Pipeline validation
6.3.4 Postmortems
6.3.5 Self-serve data movement recap
Summary
7 Machine learning
7.1 Training a machine learning model
7.1.1 Training a model using scikit-learn
7.1.2 High spender model implementation
7.2 Introducing Azure Machine Learning
7.2.1 Creating a workspace
7.2.2 Creating an Azure Machine Learning compute target.
7.2.3 Setting up Azure Machine Learning storage
7.2.4 Running ML in the cloud
7.2.5 Azure Machine Learning recap
7.3 MLOps
7.3.1 Deploying from Git
7.3.2 Storing pipeline IDs
7.3.3 DevOps for Azure Machine Learning recap
7.4 Orchestrating machine learning
7.4.1 Connecting Azure Data Factory with Azure Machine Learning
7.4.2 Machine learning orchestration
7.4.3 Orchestrating recap
Summary
Part 3 Governance
8 Metadata
8.1 Making sense of the data
8.2 Introducing Azure Purview
8.3 Maintaining a data inventory
8.3.1 Setting up a scan
8.3.2 Browsing the data dictionary
8.3.3 Data dictionary recap
8.4 Managing a data glossary
8.4.1 Adding a new glossary term
8.4.2 Curating terms
8.4.3 Custom templates and bulk import
8.4.4 Data glossary recap
8.5 Understanding Azure Purview's advanced features
8.5.1 Tracking lineage
8.5.2 Classification rules
8.5.3 REST API
8.5.4 Advanced features recap
Summary
9 Data quality
9.1 Testing data
9.1.1 Availability tests
9.1.2 Correctness tests
9.1.3 Completeness tests
9.1.4 Detecting anomalies
9.1.5 Testing data recap
9.2 Running data quality checks
9.2.1 Testing using Azure Data Factory
9.2.2 Executing tests
9.2.3 Creating and using a template
9.2.4 Running data quality checks recap
9.3 Scaling out data testing
9.3.1 Supporting multiple data fabrics
9.3.2 Testing at rest and during movement
9.3.3 Authoring tests
9.3.4 Storing tests and results
Summary
10 Compliance
10.1 Data classification
10.1.1 Feature data
10.1.2 Telemetry
10.1.3 User data
10.1.4 User-owned data
10.1.5 Business data
10.1.6 Data classification recap
10.2 Changing classification through processing
10.2.1 Aggregation
10.2.2 Anonymization
10.2.3 Pseudonymization
10.2.4 Masking.
10.2.5 Processing classification changes recap
10.3 Implementing an access model
10.3.1 Security groups
10.3.2 Securing Azure Data Explorer
10.3.3 Access model recap
10.4 Complying with GDPR and other considerations
10.4.1 Data handling
10.4.2 Data subject requests
10.4.3 Other considerations
Summary
11 Distributing data
11.1 Data distribution overview
11.2 Building a data API
11.2.1 Introducing Azure Cosmos DB
11.2.2 Populating the Cosmos DB collection
11.2.3 Retrieving data
11.2.4 Data API recap
11.3 Serving machine learning
11.4 Sharing data for bulk copy
11.4.1 Separating compute resources
11.4.2 Introducing Azure Data Share
11.4.3 Sharing data for bulk copy recap
11.5 Data sharing best practices
Summary
Appendix A. Azure services
Azure Storage
Azure SQL
Azure Synapse Analytics
Azure Data Explorer
Azure Databricks
Azure Cosmos DB
Appendix B. KQL quick reference
Common query reference
SQL to KQL
Appendix C. Running code samples
index
inside back cover
MLOps.

Data engineering on Azure

Similar Items