MCA Microsoft Certified Associate Azure Data Engineer Study Guide Exam DP-203
Prepare for the Azure Data Engineering certification--and an exciting new career in analytics--with this must-have study aide In the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203, accomplished data engineer and tech educator Benjamin Perkins delivers a hands-on, prac...
Autor principal: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Newark :
John Wiley & Sons, Incorporated
2023.
|
Edición: | 1st ed |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009769038006719 |
Tabla de Contenidos:
- Cover Page
- Title Page
- Copyright Page
- Acknowledgments
- About the Author
- About the Technical Editor
- Contents at a Glance
- Contents
- Table of Exercises
- Introduction
- Part I Azure Data Engineer Certification and Azure Products
- Chapter 1 Gaining the Azure Data Engineer Associate Certification
- The Journey to Certification
- How to Pass Exam DP-203
- Understanding the Exam Expectations and Requirements
- Use Azure Daily
- Read Azure Articles to Stay Current
- Have an Understanding of All Azure Products
- Azure Product Name Recognition
- Azure Data Analytics
- Azure Synapse Analytics
- Azure Databricks
- Azure HDInsight
- Azure Analysis Services
- Azure Data Factory
- Azure Event Hubs
- Azure Stream Analytics
- Other Products
- Azure Storage Products
- Azure Data Lake Storage
- Azure Storage
- Other Products
- Azure Databases
- Azure Cosmos DB
- Azure SQL Server Products
- Additional Azure Databases
- Other Products
- Azure Security
- Azure Active Directory
- Role-Based Access Control
- Attribute-Based Access Control
- Azure Key Vault
- Other Products
- Azure Networking
- Virtual Networks
- Other Products
- Azure Compute
- Azure Virtual Machines
- Azure Virtual Machine Scale Sets
- Azure App Service Web Apps
- Azure Functions
- Azure Batch
- Azure Management and Governance
- Azure Monitor
- Azure Purview
- Azure Policy
- Azure Blueprints (Preview)
- Azure Lighthouse
- Azure Cost Management and Billing
- Other Products
- Summary
- Exam Essentials
- Review Questions
- Chapter 2 CREATE DATABASE dbName
- The Brainjammer
- A Historical Look at Data
- Variety
- Velocity
- Volume
- Data Locations
- Data File Formats
- Data Structures, Types, and Concepts
- Data Structures
- Data Types and Management
- Data Concepts
- Data Programming and Querying for Data Engineers.
- Data Programming
- Querying Data
- Understanding Big Data Processing
- Big Data Stages
- ETL, ELT, ELTL
- Analytics Types
- Big Data Layers
- Summary
- Exam Essentials
- Review Questions
- Part II Design and Implement Data Storage
- Chapter 3 Data Sources and Ingestion
- Where Does Data Come From?
- Design a Data Storage Structure
- Design an Azure Data Lake Solution
- Recommended File Types for Storage
- Recommended File Types for Analytical Queries
- Design for Efficient Querying
- Design for Data Pruning
- Design a Folder Structure That Represents the Levels of Data Transformation
- Design a Distribution Strategy
- Design a Data Archiving Solution
- Design a Partition Strategy
- Design a Partition Strategy for Files
- Design a Partition Strategy for Analytical Workloads
- Design a Partition Strategy for Efficiency and Performance
- Design a Partition Strategy for Azure Synapse Analytics
- Identify When Partitioning Is Needed in Azure Data Lake Storage Gen2
- Design the Serving/Data Exploration Layer
- Design Star Schemas
- Design Slowly Changing Dimensions
- Design a Dimensional Hierarchy
- Design a Solution for Temporal Data
- Design for Incremental Loading
- Design Analytical Stores
- Design Metastores in Azure Synapse Analytics and Azure Databricks
- The Ingestion of Data into a Pipeline
- Azure Synapse Analytics
- Azure Data Factory
- Azure Databricks
- Event Hubs and IoT Hub
- Azure Stream Analytics
- Apache Kafka for HDInsight
- Migrating and Moving Data
- Summary
- Exam Essentials
- Review Questions
- Chapter 4 The Storage of Data
- Implement Physical Data Storage Structures
- Implement Compression
- Implement Partitioning
- Implement Sharding
- Implement Different Table Geometries with Azure Synapse Analytics Pools
- Implement Data Redundancy
- Implement Distributions.
- Implement Data Archiving
- Azure Synapse Analytics Develop Hub
- Implement Logical Data Structures
- Build a Temporal Data Solution
- Build a Slowly Changing Dimension
- Build a Logical Folder Structure
- Build External Tables
- Implement File and Folder Structures for Efficient Querying and Data Pruning
- Implement a Partition Strategy
- Implement a Partition Strategy for Files
- Implement a Partition Strategy for Analytical Workloads
- Implement a Partition Strategy for Streaming Workloads
- Implement a Partition Strategy for Azure Synapse Analytics
- Design and Implement the Data Exploration Layer
- Deliver Data in a Relational Star Schema
- Deliver Data in Parquet Files
- Maintain Metadata
- Implement a Dimensional Hierarchy
- Create and Execute Queries by Using a Compute Solution That Leverages SQL Serverless and Spark Cluster
- Recommend Azure Synapse Analytics Database Templates
- Implement Azure Synapse Analytics Database Templates
- Additional Data Storage Topics
- Storing Raw Data in Azure Databricks for Transformation
- Storing Data Using Azure HDInsight
- Storing Prepared, Trained, and Modeled Data
- Summary
- Exam Essentials
- Review Questions
- Part III Develop Data Processing
- Chapter 5 Transform, Manage, and Prepare Data
- Ingest and Transform Data
- Transform Data Using Azure Synapse Pipelines
- Transform Data Using Azure Data Factory
- Transform Data Using Apache Spark
- Transform Data Using Transact-SQL
- Transform Data Using Stream Analytics
- Cleanse Data
- Split Data
- Shred JSON
- Encode and Decode Data
- Configure Error Handling for the Transformation
- Normalize and Denormalize Values
- Transform Data by Using Scala
- Perform Exploratory Data Analysis
- Transformation and Data Management Concepts
- Transformation
- Data Management
- Azure Databricks
- Data Modeling and Usage.
- Data Modeling with Machine Learning
- Usage
- Summary
- Exam Essentials
- Review Questions
- Chapter 6 Create and Manage Batch Processing and Pipelines
- Design and Develop a Batch Processing Solution
- Design a Batch Processing Solution
- Develop Batch Processing Solutions
- Create Data Pipelines
- Handle Duplicate Data
- Handle Missing Data
- Handle Late-Arriving Data
- Upsert Data
- Configure the Batch Size
- Configure Batch Retention
- Design and Develop Slowly Changing Dimensions
- Design and Implement Incremental Data Loads
- Integrate Jupyter/IPython Notebooks into a Data Pipeline
- Revert Data to a Previous State
- Handle Security and Compliance Requirements
- Design and Create Tests for Data Pipelines
- Scale Resources
- Design and Configure Exception Handling
- Debug Spark Jobs Using the Spark UI
- Implement Azure Synapse Link and Query the Replicated Data
- Use PolyBase to Load Data to a SQL Pool
- Read from and Write to a Delta Table
- Manage Batches and Pipelines
- Trigger Batches
- Schedule Data Pipelines
- Validate Batch Loads
- Implement Version Control for Pipeline Artifacts
- Manage Data Pipelines
- Manage Spark Jobs in a Pipeline
- Handle Failed Batch Loads
- Summary
- Exam Essentials
- Review Questions
- Chapter 7 Design and Implement a Data Stream Processing Solution
- Develop a Stream Processing Solution
- Design a Stream Processing Solution
- Create a Stream Processing Solution
- Process Time Series Data
- Design and Create Windowed Aggregates
- Process Data Within One Partition
- Process Data Across Partitions
- Upsert Data
- Handle Schema Drift
- Configure Checkpoints/Watermarking During Processing
- Replay Archived Stream Data
- Design and Create Tests for Data Pipelines
- Monitor for Performance and Functional Regressions.
- Optimize Pipelines for Analytical or Transactional Purposes
- Scale Resources
- Design and Configure Exception Handling
- Handle Interruptions
- Ingest and Transform Data
- Transform Data Using Azure Stream Analytics
- Monitor Data Storage and Data Processing
- Monitor Stream Processing
- Summary
- Exam Essentials
- Review Questions
- Part IV Secure, Monitor, and Optimize Data Storage and Data Processing
- Chapter 8 Keeping Data Safe and Secure
- Design Security for Data Policies and Standards
- Design a Data Auditing Strategy
- Design a Data Retention Policy
- Design for Data Privacy
- Design to Purge Data Based on Business Requirements
- Design Data Encryption for Data at Rest and in Transit
- Design Row-Level and Column-Level Security
- Design a Data Masking Strategy
- Design Access Control for Azure Data Lake Storage Gen2
- Implement Data Security
- Implement a Data Auditing Strategy
- Manage Sensitive Information
- Implement a Data Retention Policy
- Encrypt Data at Rest and in Motion
- Implement Row-Level and Column-Level Security
- Implement Data Masking
- Manage Identities, Keys, and Secrets Across Different Data Platform Technologies
- Implement Access Control for Azure Data Lake Storage Gen2
- Implement Secure Endpoints (Private and Public)
- Implement Resource Tokens in Azure Databricks
- Load a DataFrame with Sensitive Information
- Write Encrypted Data to Tables or Parquet Files
- Develop a Batch Processing Solution
- Handle Security and Compliance Requirements
- Design and Implement the Data Exploration Layer
- Browse and Search Metadata in Microsoft Purview Data Catalog
- Push New or Updated Data Lineage to Microsoft Purview
- Summary
- Exam Essentials
- Review Questions
- Chapter 9 Monitoring Azure Data Storage and Processing
- Monitoring Data Storage and Data Processing.
- Implement Logging Used by Azure Monitor.