Azure storage, streaming, and batch analytics a guide for data engineers

Detalles Bibliográficos
Otros Autores: Nuckolls, Richard L., author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Shelter Island, NY : Manning Publications Co [2020]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631241106719
Tabla de Contenidos:
  • Intro
  • Azure Storage, Streaming, and Batch Analytic
  • Copyright
  • dedication
  • brief contents
  • contents
  • front matter
  • preface
  • acknowledgements
  • about this book
  • Who should read this book
  • How this book is organized: a roadmap
  • About the code
  • Author online
  • about the author
  • about the cover illustration
  • 1 What is data engineering?
  • 1.1 What is data engineering?
  • 1.2 What do data engineers do?
  • 1.3 How does Microsoft define data engineering?
  • 1.3.1 Data acquisition
  • 1.3.2 Data storage
  • 1.3.3 Data processing
  • 1.3.4 Data queries
  • 1.3.5 Orchestration
  • 1.3.6 Data retrieval
  • 1.4 What tools does Azure provide for data engineering?
  • 1.5 Azure Data Engineers
  • 1.6 Example application
  • Summary
  • 2 Building an analytics system in Azure
  • 2.1 Fundamentals of Azure architecture
  • 2.1.1 Azure subscriptions
  • 2.1.2 Azure regions
  • 2.1.3 Azure naming conventions
  • 2.1.4 Resource groups
  • 2.1.5 Finding resources
  • 2.2 Lambda architecture
  • 2.3 Azure cloud services
  • 2.3.1 Azure analytics system architecture
  • 2.3.2 Event Hubs
  • 2.3.3 Stream Analytics
  • 2.3.4 Data Lake Storage
  • 2.3.5 Data Lake Analytics
  • 2.3.6 SQL Database
  • 2.3.7 Data Factory
  • 2.3.8 Azure PowerShell
  • 2.4 Walk-through of processing a series of event data records
  • 2.4.1 Hot path
  • 2.4.2 Cold path
  • 2.4.3 Choosing abstract Azure services
  • 2.5 Calculating cloud hosting costs
  • 2.5.1 Event Hubs
  • 2.5.2 Stream Analytics
  • 2.5.3 Data Lake Storage
  • 2.5.4 Data Lake Analytics
  • 2.5.5 SQL Database
  • 2.5.6 Data Factory
  • Summary
  • 3 General storage with Azure Storage accounts
  • 3.1 Cloud storage services
  • 3.1.1 Before you begin
  • 3.2 Creating an Azure Storage account
  • 3.2.1 Using Azure portal
  • 3.2.2 Using Azure PowerShell
  • 3.2.3 Azure Storage replication
  • 3.3 Storage account services.
  • 3.3.1 Blob storage
  • 3.3.2 Creating a Blobs service container
  • 3.3.3 Blob tiering
  • 3.3.4 Copy tools
  • 3.3.5 Queues
  • 3.3.6 Creating a queue
  • 3.3.7 Azure Storage queue options
  • 3.4 Storage account access
  • 3.4.1 Blob container security
  • 3.4.2 Designing Storage account access
  • 3.5 Exercises
  • 3.5.1 Exercise 1
  • 3.5.2 Exercise 2
  • Summary
  • 4 Azure Data Lake Storage
  • 4.1 Create an Azure Data Lake store
  • 4.1.1 Using Azure Portal
  • 4.1.2 Using Azure PowerShell
  • 4.2 Data Lake store access
  • 4.2.1 Access schemes
  • 4.2.2 Configuring access
  • 4.2.3 Hierarchy structure in the Data Lake store
  • 4.3 Storage folder structure and data drift
  • 4.3.1 Hierarchy structure revisited
  • 4.3.2 Data drift
  • 4.4 Copy tools for Data Lake stores
  • 4.4.1 Data Explorer
  • 4.4.2 ADLCopy tool
  • 4.4.3 Azure Storage Explorer tool
  • 4.5 Exercises
  • 4.5.1 Exercise 1
  • 4.5.2 Exercise 2
  • Summary
  • 5 Message handling with Event Hubs
  • 5.1 How does an Event Hub work?
  • 5.2 Collecting data in Azure
  • 5.3 Create an Event Hubs namespace
  • 5.3.1 Using Azure PowerShell
  • 5.3.2 Throughput units
  • 5.3.3 Event Hub geo-disaster recovery
  • 5.3.4 Failover with geo-disaster recovery
  • 5.4 Creating an Event Hub
  • 5.4.1 Using Azure portal
  • 5.4.2 Using Azure PowerShell
  • 5.4.3 Shared access policy
  • 5.5 Event Hub partitions
  • 5.5.1 Multiple consumers
  • 5.5.2 Why specify a partition?
  • 5.5.3 Why not specify a partition?
  • 5.5.4 Event Hubs message journal
  • 5.5.5 Partitions and throughput units
  • 5.6 Configuring Capture
  • 5.6.1 File name formats
  • 5.6.2 Secure access for Capture
  • 5.6.3 Enabling Capture
  • 5.6.4 The importance of time
  • 5.7 Securing access to Event Hubs
  • 5.7.1 Shared Access Signature policies
  • 5.7.2 Writing to Event Hubs
  • 5.8 Exercises
  • 5.8.1 Exercise 1
  • 5.8.2 Exercise 2
  • 5.8.3 Exercise 3
  • Summary.
  • 6 Real-time queries with Azure Stream Analytics
  • 6.1 Creating a Stream Analytics service
  • 6.1.1 Elements of a Stream Analytics job
  • 6.1.2 Create an ASA job using the Azure portal
  • 6.1.3 Create an ASA job using Azure PowerShell
  • 6.2 Configuring inputs and outputs
  • 6.2.1 Event Hub job input
  • 6.2.2 ASA job outputs
  • 6.3 Creating a job query
  • 6.3.1 Starting the ASA job
  • 6.3.2 Failure to start
  • 6.3.3 Output exceptions
  • 6.4 Writing job queries
  • 6.4.1 Window functions
  • 6.4.2 Machine learning functions
  • 6.5 Managing performance
  • 6.5.1 Streaming units
  • 6.5.2 Event ordering
  • 6.6 Exercises
  • 6.6.1 Exercise 1
  • 6.6.2 Exercise 2
  • Summary
  • 7 Batch queries with Azure Data Lake Analytics
  • 7.1 U-SQL language
  • 7.1.1 Extractors
  • 7.1.2 Outputters
  • 7.1.3 File selectors
  • 7.1.4 Expressions
  • 7.2 U-SQL jobs
  • 7.2.1 Selecting the biometric data files
  • 7.2.2 Schema extraction
  • 7.2.3 Aggregation
  • 7.2.4 Writing files
  • 7.3 Creating a Data Lake Analytics service
  • 7.3.1 Using Azure portal
  • 7.3.2 Using Azure PowerShell
  • 7.4 Submitting jobs to ADLA
  • 7.4.1 Using Azure portal
  • 7.4.2 Using Azure PowerShell
  • 7.5 Efficient U-SQL job executions
  • 7.5.1 Monitoring a U-SQL job
  • 7.5.2 Analytics units
  • 7.5.3 Vertexes
  • 7.5.4 Scaling the job execution
  • 7.6 Using Blob Storage
  • 7.6.1 Constructing Blob file selectors
  • 7.6.2 Adding a new data source
  • 7.6.3 Filtering rowsets
  • 7.7 Exercises
  • 7.7.1 Exercise 1
  • 7.7.2 Exercise 2
  • Summary
  • 8 U-SQL for complex analytics
  • 8.1 Data Lake Analytics Catalog
  • 8.1.1 Simplifying U-SQL queries
  • 8.1.2 Simplifying data access
  • 8.1.3 Loading data for reuse
  • 8.2 Window functions
  • 8.3 Local C# functions
  • 8.4 Exercises
  • 8.4.1 Exercise 1
  • 8.4.2 Exercise 2
  • Summary
  • 9 Integrating with Azure Data Lake Analytics
  • 9.1 Processing unstructured data.
  • 9.1.1 Azure Cognitive Services
  • 9.1.2 Managing assemblies in the Data Lake
  • 9.1.3 Image data extraction with Advanced Analytics
  • 9.2 Reading different file types
  • 9.2.1 Adding custom libraries with a Catalog
  • 9.2.2 Creating a catalog database
  • 9.2.3 Building the U-SQL DataFormats solution
  • 9.2.4 Code folders
  • 9.2.5 Using custom assemblies
  • 9.3 Connecting to remote sources
  • 9.3.1 External databases
  • 9.3.2 Credentials
  • 9.3.3 Data Source
  • 9.3.4 Tables and views
  • 9.4 Exercises
  • 9.4.1 Exercise 1
  • 9.4.2 Exercise 2
  • Summary
  • 10 Service integration with Azure Data Factory
  • 10.1 Creating an Azure Data Factory service
  • 10.2 Secure authentication
  • 10.2.1 Azure Active Directory integration
  • 10.2.2 Azure Key Vault
  • 10.3 Copying files with ADF
  • 10.3.1 Creating a Files storage container
  • 10.3.2 Adding secrets to AKV
  • 10.3.3 Creating a Files storage linkedservice
  • 10.3.4 Creating an ADLS linkedservice
  • 10.3.5 Creating a pipeline and activity
  • 10.3.6 Creating a scheduled trigger
  • 10.4 Running an ADLA job
  • 10.4.1 Creating an ADLA linkedservice
  • 10.4.2 Creating a pipeline and activity
  • 10.5 Exercises
  • 10.5.1 Exercise 1
  • 10.5.2 Exercise 2
  • Summary
  • 11 Managed SQL with Azure SQL Database
  • 11.1 Creating an Azure SQL Database
  • 11.1.1 Create a SQL Server and SQLDB
  • 11.2 Securing SQLDB
  • 11.3 Availability and recovery
  • 11.3.1 Restoring and moving SQLDB
  • 11.3.2 Database safeguards
  • 11.3.3 Creating alerts for SQLDB
  • 11.4 Optimizing costs for SQLDB
  • 11.4.1 Pricing structure
  • 11.4.2 Scaling SQLDB
  • 11.4.3 Serverless
  • 11.4.4 Elastic Pools
  • 11.5 Exercises
  • 11.5.1 Exercise 1
  • 11.5.2 Exercise 2
  • 11.5.3 Exercise 3
  • 11.5.4 Exercise 4
  • Summary
  • 12 Integrating Data Factory with SQL Database
  • 12.1 Before you begin
  • 12.2 Importing data with external data sources.
  • 12.2.1 Creating a database scoped credential
  • 12.2.2 Creating an external data source
  • 12.2.3 Creating an external table
  • 12.2.4 Importing Blob files
  • 12.3 Importing file data with ADF
  • 12.3.1 Authenticating between ADF and SQLDB
  • 12.3.2 Creating SQL Database linkedservice
  • 12.3.3 Creating datasets
  • 12.3.4 Creating a copy activity and pipeline
  • 12.4 Exercises
  • 12.4.1 Exercise 1
  • 12.4.2 Exercise 2
  • 12.4.3 Exercise 3
  • Summary
  • 13 Where to go next
  • 13.1 Data catalog
  • 13.1.1 Data Catalog as a service
  • 13.1.2 Data locations
  • 13.1.3 Data definitions
  • 13.1.4 Data frequency
  • 13.1.5 Business drivers
  • 13.2 Version control and backups
  • 13.2.1 Blob Storage
  • 13.2.2 Data Lake Storage
  • 13.2.3 Stream Analytics
  • 13.2.4 Data Lake Analytics
  • 13.2.5 Data Factory configuration files
  • 13.2.6 SQL Database
  • 13.3 Microsoft certifications
  • 13.4 Signing off
  • Summary
  • appendix A. Setting up Azure services through PowerShell
  • A.1 Setting up Azure PowerShell
  • A.2 Create a subscription
  • A.3 Azure naming conventions
  • A.4 Setting up common Azure resources using PowerShell
  • A.4.1 Creating a new resource group
  • A.4.2 Creating a new Azure Active Directory user
  • A.4.3 Creating a new Azure Active Directory group
  • A.5 Setting up Azure services using PowerShell
  • A.5.1 Creating a new Storage account
  • A.5.2 Creating a new Data Lake store
  • A.5.3 Create new Event Hub
  • A.5.4 Create new Stream Analytics job
  • A.5.5 Create new Data Lake Analytics account
  • A.5.6 Create new SQL Server and Database
  • A.5.7 Create a new Data Factory service
  • A.5.8 Creating a new App registration
  • A.5.9 Creating a new key vault
  • A.5.10 Create new SQL Server and Database with lookup data
  • appendix B. Configuring the Jonestown Sluggers analytics system
  • B.1 Solution design
  • B.1.1 Hot path
  • B.1.2 Cold path.
  • B.2 Naming convention.