Data Analytics in the AWS Cloud Building a Data Platform for BI and Predictive Analytics on AWS
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Hoboken, New Jersey :
John Wiley & Sons, Inc
[2023]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009752726206719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright Page
- About the Author
- About the Technical Editor
- Acknowledgments
- Contents at a Glance
- Contents
- Introduction
- What Is a Data Lake?
- When You Do Not Need a Data Lake
- When Do You Need Analytics?
- When Do You Need a Data Lake for Analytics?
- How About an Analytics Team?
- The Data Platform
- The End of the Beginning
- Chapter 1 AWS Data Lakes and Analytics Technology Overview
- Why AWS?
- What Does a Data Lake Look Like in AWS?
- Analytics on AWS
- Skills Required to Build and Maintain an AWS Analytics Pipeline
- Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team
- The Data Vision
- Support
- DA Team Roles
- Early Stage Roles
- Team Lead
- Data Architect
- Data Engineer
- Data Analyst
- Maturity Stage Roles
- Data Scientist
- Cloud Engineer
- Business Intelligence (BI) Developer
- Machine Learning Engineer
- Business Analyst
- Niche Roles
- Analytics Flow at a Process Level
- Workflow Methodology
- The DA Team Mantra: "Automate Everything"
- Analytics Models in the Wild: Centralized, Distributed, Center of Excellence
- Centralized
- Distributed
- Center of Excellence
- Summary
- Chapter 3 Working on AWS
- Accessing AWS
- Everything Is a Resource
- S3: An Important Exception
- IAM: Policies, Roles, and Users
- Policies
- Identity-Based Policies
- Resource-Based Policies
- Roles
- Users and User Groups
- Summarizing IAM
- Working with the Web Console
- The AWS Command-Line Interface
- Installing AWS CLI
- Linux Installation
- macOS Installation
- Windows
- Configuring AWS CLI
- A Note on Region
- Setting Individual Parameters
- Using Profiles and Configuration Files
- Final Notes on Configuration
- Using the AWS CLI
- Using Skeletons and File Inputs
- Cleaning Up!.
- Infrastructure-as-Code: CloudFormation and Terraform
- CloudFormation
- CloudFormation Stacks
- CloudFormation Template Anatomy
- CloudFormation Changesets
- Getting Stack Information
- Cleaning Up Again
- CloudFormation Conclusions
- Terraform
- Coding Style
- Modularity
- Limitations
- Terraform vs. CloudFormation
- Infrastructure-as-Code: CDK, Pulumi, Cloudcraft, and Other Solutions
- AWS CDK
- Pulumi
- Cloudcraft
- Infrastructure Management Conclusions
- Chapter 4 Serverless Computing and Data Engineering
- Serverless vs. Fully Managed
- AWS Serverless Technologies
- AWS Lambda
- Pricing Model
- Laser Focus on Code
- The Lambda Paradigm Shift
- Virtually Infinite Scalability
- Geographical Distribution
- A Lambda Hello World
- Lambda Configuration
- Runtime
- Container-Based Lambdas
- Architectures
- Memory
- Networking
- Execution Role
- Environment Variables
- AWS EventBridge
- AWS Fargate
- AWS DynamoDB
- AWS SNS
- Amazon SQS
- AWS CloudWatch
- Amazon QuickSight
- AWS Step Functions
- Amazon API Gateway
- Amazon Cognito
- AWS Serverless Application Model (SAM)
- Ephemeral Infrastructure
- AWS SAM Installation
- Configuration
- Creating Your First AWS SAM Project
- Application Structure
- SAM Resource Types
- SAM Lambda Template
- !! Recursive Lambda Invocation !!
- Function Metadata
- Outputs
- Implicitly Generated Resources
- Other Template Sections
- Lambda Code
- Building Your First SAM Application
- Testing the AWS SAM Application Locally
- Deployment
- Cleaning Up
- Summary
- Chapter 5 Data Ingestion
- AWS Data Lake Architecture
- Serverless Data Lake Architecture Structure
- Ingestion
- Storage and Processing
- Cataloging, Governance, and Search
- Security and Monitoring
- Consumption
- Sample Processing Architecture: Cataloging Images into DynamoDB.
- Use Case Description
- SAM Application Creation
- S3-Triggered Lambda
- Adding DynamoDB
- Lambda Execution Context
- Inserting into DynamoDB
- Cleaning Up
- Serverless Ingestion
- AWS Fargate
- AWS Lambda
- Example Architecture: Fargate-Based Periodic Batch Import
- The Basic Importer
- ECS CLI
- AWS Copilot CLI
- Clean Up
- AWS Kinesis Ingestion
- Example Architecture: Two-Pronged Delivery
- Fully Managed Ingestion with AppFlow
- Operational Data Ingestion with Database Migration Service
- DMS Concepts
- DMS Instance
- DMS Endpoints
- DMS Tasks
- Summary of the Workflow
- Common Use of DMS
- Example Architecture: DMS to S3
- DMS Instance
- DMS Endpoints
- DMS Task
- Summary
- Chapter 6 Processing Data
- Phases of Data Preparation
- What Is ETL? Why Should I Care?
- ETL Job vs. Streaming Job
- Overview of ETL in AWS
- ETL with AWS Glue
- ETL with Lambda Functions
- ETL with Hadoop/EMR
- Other Ways to Perform ETL
- ETL Job Design Concepts
- Source Identification
- Destination Identification
- Mappings
- Validation
- Filter
- Join, Denormalization, Relationalization
- AWS Glue for ETL
- Really, It's Just Spark
- Visual
- Spark Script Editor
- Python Shell Script Editor
- Jupyter Notebook
- Connectors
- Creating Connections
- Creating Connections with the Web Console
- Creating Connections with the AWS CLI
- Creating ETL Jobs with AWS Glue Visual Editor
- ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet)
- Job Bookmarks
- Transformations
- Apply Mapping
- Filter
- Other Available Transforms
- Run the Edited Job
- Visual Editor with Source and Target Conclusions
- Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target)
- Creating ETL Jobs with the Spark Script Editor
- Developing ETL Jobs with AWS Glue Notebooks
- What Is a Notebook?
- Notebook Structure.
- Step 1: Load Code into a DynamicFrame
- Step 2: Apply Field Mapping
- Step 3: Apply the Filter
- Step 4: Write to S3 in Parquet Format
- Example: Joining and Denormalizing Data from Two S3 Locations
- Conclusions for Manually Authored Jobs with Notebooks
- Creating ETL Jobs with AWS Glue Interactive Sessions
- It's Magic
- Development Workflow
- Streaming Jobs
- Differences with a Standard ETL Job
- Streaming Sources
- Example: Process Kinesis Streams with a Streaming Job
- Streaming ETL Jobs Conclusions
- Summary
- Chapter 7 Cataloging, Governance, and Search
- Cataloging with AWS Glue
- AWS Glue and the AWS Glue Data Catalog
- Glue Databases and Tables
- Databases
- The Idea of Schema-on-Read
- Tables
- Create Table Manually
- Creating a Table from an Existing Schema
- Creating a Table with a Crawler
- Summary on Databases and Tables
- Crawlers
- Updating or Not Updating?
- Running the Crawler
- Creating a Crawler from the AWS CLI
- Retrieving Table Information from the CLI
- Classifiers
- Classifier Example
- Crawlers and Classifiers Summary
- Search with Amazon Athena: The Heart of Analytics in AWS
- A Bit of History
- Interface Overview
- Creating Tables Manually
- Athena Data Types
- Complex Types
- Running a Query
- Connecting with JDBC and ODBC
- Query Stats
- Recent Queries and Saved Queries
- The Power of Partitions
- Athena Pricing Model
- Automatic Naming
- Athena Query Output
- Athena Peculiarities (SQL and Not)
- Computed Fields Gotcha and WITH Statement Workaround
- Lowercase!
- Query Explain
- Deduplicating Records
- Working with JSON, Flattening, and Unnesting
- Athena Views
- CREATE TABLE AS SELECT (CTAS)
- Saving Queries and Reusing Saved Queries
- Running Parameterized Queries
- Athena Federated Queries
- Athena Lambda Connectors
- Note on Connection Errors.
- Performing Federated Queries
- Creating a View from a Federated Query
- Governing: Athena Workgroups, Lake Formation, and More
- Athena Workgroups
- Fine-Grained Athena Access with IAM
- Recap of Athena-Based Governance
- AWS Lake Formation
- Registering a Location in Lake Formation
- Creating a Database in Lake Formation
- Assigning Permissions in Lake Formation
- LF-Tags and Permissions in Lake Formation
- Data Filters
- Governance Conclusions
- Summary
- Chapter 8 Data Consumption: BI, Visualization, and Reporting
- QuickSight
- Signing Up for QuickSight
- Standard Plan
- Enterprise Plan
- Users and User Groups
- Managing Users and Groups
- Managing QuickSight
- Users and Groups
- Your Subscriptions
- SPICE Capacity
- Account Settings
- Security and Permissions
- VPC Connections
- Mobile Settings
- Domains and Embedding
- Single Sign-On
- Data Sources and Datasets
- Creating an Athena Data Source
- Creating Other Data Sources
- Creating a Data Source from the AWS CLI
- Creating a Dataset from a Table
- Creating a Dataset from a SQL Query
- Duplicating Datasets
- Note on Creating Datasets
- QuickSight Favorites, Recent, and Folders
- SPICE
- Manage SPICE Capacity
- Refresh Schedule
- QuickSight Data Editor
- QuickSight Data Types
- Change Data Types
- Calculated Fields
- Joining Data
- Excluding Fields
- Filtering Data
- Removing Data
- Geospatial Hierarchies and Adding Fields to Hierarchies
- Unsupported Format Dates
- Visualizing Data: QuickSight Analysis
- Adding a Title and a Description to Your Analysis
- Renaming the Sheet
- Your First Visual with AutoGraph
- Field Wells
- Visual Types
- Saving and Autosaving
- A First Example: Pie Chart
- Renaming a Visual
- Filtering Data
- Adding Drill-Downs
- Parameters
- Actions
- Insights
- ML-Powered Insights
- Sharing an Analysis.
- Dashboards.