Azure storage, streaming, and batch analytics a guide for data engineers

Detalles Bibliográficos
Otros Autores:	Nuckolls, Richard L., author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Shelter Island, NY : Manning Publications Co [2020]
Materias:	Microsoft Azure SQL Database. SQL server. Microsoft Azure (Computing platform) Application software > Development. Cloud computing. Online databases.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631241106719

Tabla de Contenidos:

Intro
Azure Storage, Streaming, and Batch Analytic
Copyright
dedication
brief contents
contents
front matter
preface
acknowledgements
about this book
Who should read this book
How this book is organized: a roadmap
About the code
Author online
about the author
about the cover illustration
1 What is data engineering?
1.1 What is data engineering?
1.2 What do data engineers do?
1.3 How does Microsoft define data engineering?
1.3.1 Data acquisition
1.3.2 Data storage
1.3.3 Data processing
1.3.4 Data queries
1.3.5 Orchestration
1.3.6 Data retrieval
1.4 What tools does Azure provide for data engineering?
1.5 Azure Data Engineers
1.6 Example application
Summary
2 Building an analytics system in Azure
2.1 Fundamentals of Azure architecture
2.1.1 Azure subscriptions
2.1.2 Azure regions
2.1.3 Azure naming conventions
2.1.4 Resource groups
2.1.5 Finding resources
2.2 Lambda architecture
2.3 Azure cloud services
2.3.1 Azure analytics system architecture
2.3.2 Event Hubs
2.3.3 Stream Analytics
2.3.4 Data Lake Storage
2.3.5 Data Lake Analytics
2.3.6 SQL Database
2.3.7 Data Factory
2.3.8 Azure PowerShell
2.4 Walk-through of processing a series of event data records
2.4.1 Hot path
2.4.2 Cold path
2.4.3 Choosing abstract Azure services
2.5 Calculating cloud hosting costs
2.5.1 Event Hubs
2.5.2 Stream Analytics
2.5.3 Data Lake Storage
2.5.4 Data Lake Analytics
2.5.5 SQL Database
2.5.6 Data Factory
Summary
3 General storage with Azure Storage accounts
3.1 Cloud storage services
3.1.1 Before you begin
3.2 Creating an Azure Storage account
3.2.1 Using Azure portal
3.2.2 Using Azure PowerShell
3.2.3 Azure Storage replication
3.3 Storage account services.
3.3.1 Blob storage
3.3.2 Creating a Blobs service container
3.3.3 Blob tiering
3.3.4 Copy tools
3.3.5 Queues
3.3.6 Creating a queue
3.3.7 Azure Storage queue options
3.4 Storage account access
3.4.1 Blob container security
3.4.2 Designing Storage account access
3.5 Exercises
3.5.1 Exercise 1
3.5.2 Exercise 2
Summary
4 Azure Data Lake Storage
4.1 Create an Azure Data Lake store
4.1.1 Using Azure Portal
4.1.2 Using Azure PowerShell
4.2 Data Lake store access
4.2.1 Access schemes
4.2.2 Configuring access
4.2.3 Hierarchy structure in the Data Lake store
4.3 Storage folder structure and data drift
4.3.1 Hierarchy structure revisited
4.3.2 Data drift
4.4 Copy tools for Data Lake stores
4.4.1 Data Explorer
4.4.2 ADLCopy tool
4.4.3 Azure Storage Explorer tool
4.5 Exercises
4.5.1 Exercise 1
4.5.2 Exercise 2
Summary
5 Message handling with Event Hubs
5.1 How does an Event Hub work?
5.2 Collecting data in Azure
5.3 Create an Event Hubs namespace
5.3.1 Using Azure PowerShell
5.3.2 Throughput units
5.3.3 Event Hub geo-disaster recovery
5.3.4 Failover with geo-disaster recovery
5.4 Creating an Event Hub
5.4.1 Using Azure portal
5.4.2 Using Azure PowerShell
5.4.3 Shared access policy
5.5 Event Hub partitions
5.5.1 Multiple consumers
5.5.2 Why specify a partition?
5.5.3 Why not specify a partition?
5.5.4 Event Hubs message journal
5.5.5 Partitions and throughput units
5.6 Configuring Capture
5.6.1 File name formats
5.6.2 Secure access for Capture
5.6.3 Enabling Capture
5.6.4 The importance of time
5.7 Securing access to Event Hubs
5.7.1 Shared Access Signature policies
5.7.2 Writing to Event Hubs
5.8 Exercises
5.8.1 Exercise 1
5.8.2 Exercise 2
5.8.3 Exercise 3
Summary.
6 Real-time queries with Azure Stream Analytics
6.1 Creating a Stream Analytics service
6.1.1 Elements of a Stream Analytics job
6.1.2 Create an ASA job using the Azure portal
6.1.3 Create an ASA job using Azure PowerShell
6.2 Configuring inputs and outputs
6.2.1 Event Hub job input
6.2.2 ASA job outputs
6.3 Creating a job query
6.3.1 Starting the ASA job
6.3.2 Failure to start
6.3.3 Output exceptions
6.4 Writing job queries
6.4.1 Window functions
6.4.2 Machine learning functions
6.5 Managing performance
6.5.1 Streaming units
6.5.2 Event ordering
6.6 Exercises
6.6.1 Exercise 1
6.6.2 Exercise 2
Summary
7 Batch queries with Azure Data Lake Analytics
7.1 U-SQL language
7.1.1 Extractors
7.1.2 Outputters
7.1.3 File selectors
7.1.4 Expressions
7.2 U-SQL jobs
7.2.1 Selecting the biometric data files
7.2.2 Schema extraction
7.2.3 Aggregation
7.2.4 Writing files
7.3 Creating a Data Lake Analytics service
7.3.1 Using Azure portal
7.3.2 Using Azure PowerShell
7.4 Submitting jobs to ADLA
7.4.1 Using Azure portal
7.4.2 Using Azure PowerShell
7.5 Efficient U-SQL job executions
7.5.1 Monitoring a U-SQL job
7.5.2 Analytics units
7.5.3 Vertexes
7.5.4 Scaling the job execution
7.6 Using Blob Storage
7.6.1 Constructing Blob file selectors
7.6.2 Adding a new data source
7.6.3 Filtering rowsets
7.7 Exercises
7.7.1 Exercise 1
7.7.2 Exercise 2
Summary
8 U-SQL for complex analytics
8.1 Data Lake Analytics Catalog
8.1.1 Simplifying U-SQL queries
8.1.2 Simplifying data access
8.1.3 Loading data for reuse
8.2 Window functions
8.3 Local C# functions
8.4 Exercises
8.4.1 Exercise 1
8.4.2 Exercise 2
Summary
9 Integrating with Azure Data Lake Analytics
9.1 Processing unstructured data.
9.1.1 Azure Cognitive Services
9.1.2 Managing assemblies in the Data Lake
9.1.3 Image data extraction with Advanced Analytics
9.2 Reading different file types
9.2.1 Adding custom libraries with a Catalog
9.2.2 Creating a catalog database
9.2.3 Building the U-SQL DataFormats solution
9.2.4 Code folders
9.2.5 Using custom assemblies
9.3 Connecting to remote sources
9.3.1 External databases
9.3.2 Credentials
9.3.3 Data Source
9.3.4 Tables and views
9.4 Exercises
9.4.1 Exercise 1
9.4.2 Exercise 2
Summary
10 Service integration with Azure Data Factory
10.1 Creating an Azure Data Factory service
10.2 Secure authentication
10.2.1 Azure Active Directory integration
10.2.2 Azure Key Vault
10.3 Copying files with ADF
10.3.1 Creating a Files storage container
10.3.2 Adding secrets to AKV
10.3.3 Creating a Files storage linkedservice
10.3.4 Creating an ADLS linkedservice
10.3.5 Creating a pipeline and activity
10.3.6 Creating a scheduled trigger
10.4 Running an ADLA job
10.4.1 Creating an ADLA linkedservice
10.4.2 Creating a pipeline and activity
10.5 Exercises
10.5.1 Exercise 1
10.5.2 Exercise 2
Summary
11 Managed SQL with Azure SQL Database
11.1 Creating an Azure SQL Database
11.1.1 Create a SQL Server and SQLDB
11.2 Securing SQLDB
11.3 Availability and recovery
11.3.1 Restoring and moving SQLDB
11.3.2 Database safeguards
11.3.3 Creating alerts for SQLDB
11.4 Optimizing costs for SQLDB
11.4.1 Pricing structure
11.4.2 Scaling SQLDB
11.4.3 Serverless
11.4.4 Elastic Pools
11.5 Exercises
11.5.1 Exercise 1
11.5.2 Exercise 2
11.5.3 Exercise 3
11.5.4 Exercise 4
Summary
12 Integrating Data Factory with SQL Database
12.1 Before you begin
12.2 Importing data with external data sources.
12.2.1 Creating a database scoped credential
12.2.2 Creating an external data source
12.2.3 Creating an external table
12.2.4 Importing Blob files
12.3 Importing file data with ADF
12.3.1 Authenticating between ADF and SQLDB
12.3.2 Creating SQL Database linkedservice
12.3.3 Creating datasets
12.3.4 Creating a copy activity and pipeline
12.4 Exercises
12.4.1 Exercise 1
12.4.2 Exercise 2
12.4.3 Exercise 3
Summary
13 Where to go next
13.1 Data catalog
13.1.1 Data Catalog as a service
13.1.2 Data locations
13.1.3 Data definitions
13.1.4 Data frequency
13.1.5 Business drivers
13.2 Version control and backups
13.2.1 Blob Storage
13.2.2 Data Lake Storage
13.2.3 Stream Analytics
13.2.4 Data Lake Analytics
13.2.5 Data Factory configuration files
13.2.6 SQL Database
13.3 Microsoft certifications
13.4 Signing off
Summary
appendix A. Setting up Azure services through PowerShell
A.1 Setting up Azure PowerShell
A.2 Create a subscription
A.3 Azure naming conventions
A.4 Setting up common Azure resources using PowerShell
A.4.1 Creating a new resource group
A.4.2 Creating a new Azure Active Directory user
A.4.3 Creating a new Azure Active Directory group
A.5 Setting up Azure services using PowerShell
A.5.1 Creating a new Storage account
A.5.2 Creating a new Data Lake store
A.5.3 Create new Event Hub
A.5.4 Create new Stream Analytics job
A.5.5 Create new Data Lake Analytics account
A.5.6 Create new SQL Server and Database
A.5.7 Create a new Data Factory service
A.5.8 Creating a new App registration
A.5.9 Creating a new key vault
A.5.10 Create new SQL Server and Database with lookup data
appendix B. Configuring the Jonestown Sluggers analytics system
B.1 Solution design
B.1.1 Hot path
B.1.2 Cold path.
B.2 Naming convention.

Azure storage, streaming, and batch analytics a guide for data engineers

Ejemplares similares