Azure storage, streaming, and batch analytics a guide for data engineers
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Shelter Island, NY :
Manning Publications Co
[2020]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631241106719 |
Tabla de Contenidos:
- Intro
- Azure Storage, Streaming, and Batch Analytic
- Copyright
- dedication
- brief contents
- contents
- front matter
- preface
- acknowledgements
- about this book
- Who should read this book
- How this book is organized: a roadmap
- About the code
- Author online
- about the author
- about the cover illustration
- 1 What is data engineering?
- 1.1 What is data engineering?
- 1.2 What do data engineers do?
- 1.3 How does Microsoft define data engineering?
- 1.3.1 Data acquisition
- 1.3.2 Data storage
- 1.3.3 Data processing
- 1.3.4 Data queries
- 1.3.5 Orchestration
- 1.3.6 Data retrieval
- 1.4 What tools does Azure provide for data engineering?
- 1.5 Azure Data Engineers
- 1.6 Example application
- Summary
- 2 Building an analytics system in Azure
- 2.1 Fundamentals of Azure architecture
- 2.1.1 Azure subscriptions
- 2.1.2 Azure regions
- 2.1.3 Azure naming conventions
- 2.1.4 Resource groups
- 2.1.5 Finding resources
- 2.2 Lambda architecture
- 2.3 Azure cloud services
- 2.3.1 Azure analytics system architecture
- 2.3.2 Event Hubs
- 2.3.3 Stream Analytics
- 2.3.4 Data Lake Storage
- 2.3.5 Data Lake Analytics
- 2.3.6 SQL Database
- 2.3.7 Data Factory
- 2.3.8 Azure PowerShell
- 2.4 Walk-through of processing a series of event data records
- 2.4.1 Hot path
- 2.4.2 Cold path
- 2.4.3 Choosing abstract Azure services
- 2.5 Calculating cloud hosting costs
- 2.5.1 Event Hubs
- 2.5.2 Stream Analytics
- 2.5.3 Data Lake Storage
- 2.5.4 Data Lake Analytics
- 2.5.5 SQL Database
- 2.5.6 Data Factory
- Summary
- 3 General storage with Azure Storage accounts
- 3.1 Cloud storage services
- 3.1.1 Before you begin
- 3.2 Creating an Azure Storage account
- 3.2.1 Using Azure portal
- 3.2.2 Using Azure PowerShell
- 3.2.3 Azure Storage replication
- 3.3 Storage account services.
- 3.3.1 Blob storage
- 3.3.2 Creating a Blobs service container
- 3.3.3 Blob tiering
- 3.3.4 Copy tools
- 3.3.5 Queues
- 3.3.6 Creating a queue
- 3.3.7 Azure Storage queue options
- 3.4 Storage account access
- 3.4.1 Blob container security
- 3.4.2 Designing Storage account access
- 3.5 Exercises
- 3.5.1 Exercise 1
- 3.5.2 Exercise 2
- Summary
- 4 Azure Data Lake Storage
- 4.1 Create an Azure Data Lake store
- 4.1.1 Using Azure Portal
- 4.1.2 Using Azure PowerShell
- 4.2 Data Lake store access
- 4.2.1 Access schemes
- 4.2.2 Configuring access
- 4.2.3 Hierarchy structure in the Data Lake store
- 4.3 Storage folder structure and data drift
- 4.3.1 Hierarchy structure revisited
- 4.3.2 Data drift
- 4.4 Copy tools for Data Lake stores
- 4.4.1 Data Explorer
- 4.4.2 ADLCopy tool
- 4.4.3 Azure Storage Explorer tool
- 4.5 Exercises
- 4.5.1 Exercise 1
- 4.5.2 Exercise 2
- Summary
- 5 Message handling with Event Hubs
- 5.1 How does an Event Hub work?
- 5.2 Collecting data in Azure
- 5.3 Create an Event Hubs namespace
- 5.3.1 Using Azure PowerShell
- 5.3.2 Throughput units
- 5.3.3 Event Hub geo-disaster recovery
- 5.3.4 Failover with geo-disaster recovery
- 5.4 Creating an Event Hub
- 5.4.1 Using Azure portal
- 5.4.2 Using Azure PowerShell
- 5.4.3 Shared access policy
- 5.5 Event Hub partitions
- 5.5.1 Multiple consumers
- 5.5.2 Why specify a partition?
- 5.5.3 Why not specify a partition?
- 5.5.4 Event Hubs message journal
- 5.5.5 Partitions and throughput units
- 5.6 Configuring Capture
- 5.6.1 File name formats
- 5.6.2 Secure access for Capture
- 5.6.3 Enabling Capture
- 5.6.4 The importance of time
- 5.7 Securing access to Event Hubs
- 5.7.1 Shared Access Signature policies
- 5.7.2 Writing to Event Hubs
- 5.8 Exercises
- 5.8.1 Exercise 1
- 5.8.2 Exercise 2
- 5.8.3 Exercise 3
- Summary.
- 6 Real-time queries with Azure Stream Analytics
- 6.1 Creating a Stream Analytics service
- 6.1.1 Elements of a Stream Analytics job
- 6.1.2 Create an ASA job using the Azure portal
- 6.1.3 Create an ASA job using Azure PowerShell
- 6.2 Configuring inputs and outputs
- 6.2.1 Event Hub job input
- 6.2.2 ASA job outputs
- 6.3 Creating a job query
- 6.3.1 Starting the ASA job
- 6.3.2 Failure to start
- 6.3.3 Output exceptions
- 6.4 Writing job queries
- 6.4.1 Window functions
- 6.4.2 Machine learning functions
- 6.5 Managing performance
- 6.5.1 Streaming units
- 6.5.2 Event ordering
- 6.6 Exercises
- 6.6.1 Exercise 1
- 6.6.2 Exercise 2
- Summary
- 7 Batch queries with Azure Data Lake Analytics
- 7.1 U-SQL language
- 7.1.1 Extractors
- 7.1.2 Outputters
- 7.1.3 File selectors
- 7.1.4 Expressions
- 7.2 U-SQL jobs
- 7.2.1 Selecting the biometric data files
- 7.2.2 Schema extraction
- 7.2.3 Aggregation
- 7.2.4 Writing files
- 7.3 Creating a Data Lake Analytics service
- 7.3.1 Using Azure portal
- 7.3.2 Using Azure PowerShell
- 7.4 Submitting jobs to ADLA
- 7.4.1 Using Azure portal
- 7.4.2 Using Azure PowerShell
- 7.5 Efficient U-SQL job executions
- 7.5.1 Monitoring a U-SQL job
- 7.5.2 Analytics units
- 7.5.3 Vertexes
- 7.5.4 Scaling the job execution
- 7.6 Using Blob Storage
- 7.6.1 Constructing Blob file selectors
- 7.6.2 Adding a new data source
- 7.6.3 Filtering rowsets
- 7.7 Exercises
- 7.7.1 Exercise 1
- 7.7.2 Exercise 2
- Summary
- 8 U-SQL for complex analytics
- 8.1 Data Lake Analytics Catalog
- 8.1.1 Simplifying U-SQL queries
- 8.1.2 Simplifying data access
- 8.1.3 Loading data for reuse
- 8.2 Window functions
- 8.3 Local C# functions
- 8.4 Exercises
- 8.4.1 Exercise 1
- 8.4.2 Exercise 2
- Summary
- 9 Integrating with Azure Data Lake Analytics
- 9.1 Processing unstructured data.
- 9.1.1 Azure Cognitive Services
- 9.1.2 Managing assemblies in the Data Lake
- 9.1.3 Image data extraction with Advanced Analytics
- 9.2 Reading different file types
- 9.2.1 Adding custom libraries with a Catalog
- 9.2.2 Creating a catalog database
- 9.2.3 Building the U-SQL DataFormats solution
- 9.2.4 Code folders
- 9.2.5 Using custom assemblies
- 9.3 Connecting to remote sources
- 9.3.1 External databases
- 9.3.2 Credentials
- 9.3.3 Data Source
- 9.3.4 Tables and views
- 9.4 Exercises
- 9.4.1 Exercise 1
- 9.4.2 Exercise 2
- Summary
- 10 Service integration with Azure Data Factory
- 10.1 Creating an Azure Data Factory service
- 10.2 Secure authentication
- 10.2.1 Azure Active Directory integration
- 10.2.2 Azure Key Vault
- 10.3 Copying files with ADF
- 10.3.1 Creating a Files storage container
- 10.3.2 Adding secrets to AKV
- 10.3.3 Creating a Files storage linkedservice
- 10.3.4 Creating an ADLS linkedservice
- 10.3.5 Creating a pipeline and activity
- 10.3.6 Creating a scheduled trigger
- 10.4 Running an ADLA job
- 10.4.1 Creating an ADLA linkedservice
- 10.4.2 Creating a pipeline and activity
- 10.5 Exercises
- 10.5.1 Exercise 1
- 10.5.2 Exercise 2
- Summary
- 11 Managed SQL with Azure SQL Database
- 11.1 Creating an Azure SQL Database
- 11.1.1 Create a SQL Server and SQLDB
- 11.2 Securing SQLDB
- 11.3 Availability and recovery
- 11.3.1 Restoring and moving SQLDB
- 11.3.2 Database safeguards
- 11.3.3 Creating alerts for SQLDB
- 11.4 Optimizing costs for SQLDB
- 11.4.1 Pricing structure
- 11.4.2 Scaling SQLDB
- 11.4.3 Serverless
- 11.4.4 Elastic Pools
- 11.5 Exercises
- 11.5.1 Exercise 1
- 11.5.2 Exercise 2
- 11.5.3 Exercise 3
- 11.5.4 Exercise 4
- Summary
- 12 Integrating Data Factory with SQL Database
- 12.1 Before you begin
- 12.2 Importing data with external data sources.
- 12.2.1 Creating a database scoped credential
- 12.2.2 Creating an external data source
- 12.2.3 Creating an external table
- 12.2.4 Importing Blob files
- 12.3 Importing file data with ADF
- 12.3.1 Authenticating between ADF and SQLDB
- 12.3.2 Creating SQL Database linkedservice
- 12.3.3 Creating datasets
- 12.3.4 Creating a copy activity and pipeline
- 12.4 Exercises
- 12.4.1 Exercise 1
- 12.4.2 Exercise 2
- 12.4.3 Exercise 3
- Summary
- 13 Where to go next
- 13.1 Data catalog
- 13.1.1 Data Catalog as a service
- 13.1.2 Data locations
- 13.1.3 Data definitions
- 13.1.4 Data frequency
- 13.1.5 Business drivers
- 13.2 Version control and backups
- 13.2.1 Blob Storage
- 13.2.2 Data Lake Storage
- 13.2.3 Stream Analytics
- 13.2.4 Data Lake Analytics
- 13.2.5 Data Factory configuration files
- 13.2.6 SQL Database
- 13.3 Microsoft certifications
- 13.4 Signing off
- Summary
- appendix A. Setting up Azure services through PowerShell
- A.1 Setting up Azure PowerShell
- A.2 Create a subscription
- A.3 Azure naming conventions
- A.4 Setting up common Azure resources using PowerShell
- A.4.1 Creating a new resource group
- A.4.2 Creating a new Azure Active Directory user
- A.4.3 Creating a new Azure Active Directory group
- A.5 Setting up Azure services using PowerShell
- A.5.1 Creating a new Storage account
- A.5.2 Creating a new Data Lake store
- A.5.3 Create new Event Hub
- A.5.4 Create new Stream Analytics job
- A.5.5 Create new Data Lake Analytics account
- A.5.6 Create new SQL Server and Database
- A.5.7 Create a new Data Factory service
- A.5.8 Creating a new App registration
- A.5.9 Creating a new key vault
- A.5.10 Create new SQL Server and Database with lookup data
- appendix B. Configuring the Jonestown Sluggers analytics system
- B.1 Solution design
- B.1.1 Hot path
- B.1.2 Cold path.
- B.2 Naming convention.