The Machine Learning Solutions Architect Handbook Practical Strategies and Best Practices on the ML Lifecycle, System Design, MLOps, and Generative AI
Design, build, and secure scalable machine learning (ML) systems to solve real-world business problems with Python and AWS Purchase of the print or Kindle book includes a free PDF eBook Key Features Solve large-scale ML challenges in the cloud with several open-source and AWS tools and frameworks Ap...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2024]
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009811836606719 |
Tabla de Contenidos:
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Chapter 1: Navigating the ML Life Cycle with ML Solutions Architecture
- ML versus traditional software
- ML life cycle
- Business problem understanding and ML problem framing
- Data understanding and data preparation
- Model training and evaluation
- Model deployment
- Model monitoring
- Business metric tracking
- ML challenges
- ML solutions architecture
- Business understanding and ML transformation
- Identification and verification of ML techniques
- System architecture design and implementation
- ML platform workflow automation
- Security and compliance
- Summary
- Chapter 2: Exploring ML Business Use Cases
- ML use cases in financial services
- Capital market front office
- Sales trading and research
- Investment banking
- Wealth management
- Capital market back office operations
- Net Asset Value review
- Post-trade settlement failure prediction
- Risk management and fraud
- Anti-money laundering
- Trade surveillance
- Credit risk
- Insurance
- Insurance underwriting
- Insurance claim management
- ML use cases in media and entertainment
- Content development and production
- Content management and discovery
- Content distribution and customer engagement
- ML use cases in healthcare and life sciences
- Medical imaging analysis
- Drug discovery
- Healthcare data management
- ML use cases in manufacturing
- Engineering and product design
- Manufacturing operations - product quality and yield
- Manufacturing operations - machine maintenance
- ML use cases in retail
- Product search and discovery
- Targeted marketing
- Sentiment analysis
- Product demand forecasting
- ML use cases in the automotive industry
- Autonomous vehicles
- Perception and localization
- Decision and planning
- Control.
- Advanced driver assistance systems (ADAS)
- Summary
- Chapter 3: Exploring ML Algorithms
- Technical requirements
- How machines learn
- Overview of ML algorithms
- Consideration for choosing ML algorithms
- Algorithms for classification and regression problems
- Linear regression algorithms
- Logistic regression algorithms
- Decision tree algorithms
- Random forest algorithm
- Gradient boosting machine and XGBoost algorithms
- K-nearest neighbor algorithm
- Multi-layer perceptron (MLP) networks
- Algorithms for clustering
- Algorithms for time series analysis
- ARIMA algorithm
- DeepAR algorithm
- Algorithms for recommendation
- Collaborative filtering algorithm
- Multi-armed bandit/contextual bandit algorithm
- Algorithms for computer vision problems
- Convolutional neural networks
- ResNet
- Algorithms for natural language processing (NLP) problems
- Word2Vec
- BERT
- Generative AI algorithms
- Generative adversarial network
- Generative pre-trained transformer (GPT)
- Large Language Model
- Diffusion model
- Hands-on exercise
- Problem statement
- Dataset description
- Setting up a Jupyter Notebook environment
- Running the exercise
- Summary
- Chapter 4: Data Management for ML
- Technical requirements
- Data management considerations for ML
- Data management architecture for ML
- Data storage and management
- AWS Lake Formation
- Data ingestion
- Kinesis Firehose
- AWS Glue
- AWS Lambda
- Data cataloging
- AWS Glue Data Catalog
- Custom data catalog solution
- Data processing
- ML data versioning
- S3 partitions
- Versioned S3 buckets
- Purpose-built data version tools
- ML feature stores
- Data serving for client consumption
- Consumption via API
- Consumption via data copy
- Special databases for ML
- Vector databases
- Graph databases
- Data pipelines.
- Authentication and authorization
- Data governance
- Data lineage
- Other data governance measures
- Hands-on exercise - data management for ML
- Creating a data lake using Lake Formation
- Creating a data ingestion pipeline
- Creating a Glue Data Catalog
- Discovering and querying data in the data lake
- Creating an Amazon Glue ETL job to process data for ML
- Building a data pipeline using Glue workflows
- Summary
- Chapter 5: Exploring Open-Source ML Libraries
- Technical requirements
- Core features of open-source ML libraries
- Understanding the scikit-learn ML library
- Installing scikit-learn
- Core components of scikit-learn
- Understanding the Apache Spark ML library
- Installing Spark ML
- Core components of the Spark ML library
- Understanding the TensorFlow deep learning library
- Installing TensorFlow
- Core components of TensorFlow
- Hands-on exercise - training a TensorFlow model
- Understanding the PyTorch deep learning library
- Installing PyTorch
- Core components of PyTorch
- Hands-on exercise - building and training a PyTorch model
- How to choose between TensorFlow and PyTorch
- Summary
- Chapter 6: Kubernetes Container Orchestration Infrastructure Management
- Technical requirements
- Introduction to containers
- Overview of Kubernetes and its core concepts
- Namespaces
- Pods
- Deployment
- Kubernetes Job
- Kubernetes custom resources and operators
- Services
- Networking on Kubernetes
- Security and access management
- API authentication and authorization
- Hands-on - creating a Kubernetes infrastructure on AWS
- Problem statement
- Lab instruction
- Summary
- Chapter 7: Open-Source ML Platforms
- Core components of an ML platform
- Open-source technologies for building ML platforms
- Implementing a data science environment
- Building a model training environment.
- Registering models with a model registry
- Serving models using model serving services
- The Gunicorn and Flask inference engine
- The TensorFlow Serving framework
- The TorchServe serving framework
- KFServing framework
- Seldon Core
- Triton Inference Server
- Monitoring models in production
- Managing ML features
- Automating ML pipeline workflows
- Apache Airflow
- Kubeflow Pipelines
- Designing an end-to-end ML platform
- ML platform-based strategy
- ML component-based strategy
- Summary
- Chapter 8: Building a Data Science Environment Using AWS ML Services
- Technical requirements
- SageMaker overview
- Data science environment architecture using SageMaker
- Onboarding SageMaker users
- Launching Studio applications
- Preparing data
- Preparing data interactively with SageMaker Data Wrangler
- Preparing data at scale interactively
- Processing data as separate jobs
- Creating, storing, and sharing features
- Training ML models
- Tuning ML models
- Deploying ML models for testing
- Best practices for building a data science environment
- Hands-on exercise - building a data science environment using AWS services
- Problem statement
- Dataset description
- Lab instructions
- Setting up SageMaker Studio
- Launching a JupyterLab notebook
- Training the BERT model in the Jupyter notebook
- Training the BERT model with the SageMaker Training service
- Deploying the model
- Building ML models with SageMaker Canvas
- Summary
- Chapter 9: Designing an Enterprise ML Architecture with AWS ML Services
- Technical requirements
- Key considerations for ML platforms
- The personas of ML platforms and their requirements
- ML platform builders
- Platform users and operators
- Common workflow of an ML initiative
- Platform requirements for the different personas
- Key requirements for an enterprise ML platform.
- Enterprise ML architecture pattern overview
- Model training environment
- Model training engine using SageMaker
- Automation support
- Model training life cycle management
- Model hosting environment
- Inference engines
- Authentication and security control
- Monitoring and logging
- Adopting MLOps for ML workflows
- Components of the MLOps architecture
- Monitoring and logging
- Model training monitoring
- Model endpoint monitoring
- ML pipeline monitoring
- Service provisioning management
- Best practices in building and operating an ML platform
- ML platform project execution best practices
- ML platform design and implementation best practices
- Platform use and operations best practices
- Summary
- Chapter 10: Advanced ML Engineering
- Technical requirements
- Training large-scale models with distributed training
- Distributed model training using data parallelism
- Parameter server overview
- AllReduce overview
- Distributed model training using model parallelism
- Naïve model parallelism overview
- Tensor parallelism/tensor slicing overview
- Implementing model-parallel training
- Achieving low-latency model inference
- How model inference works and opportunities for optimization
- Hardware acceleration
- Central processing units (CPUs)
- Graphics processing units (GPUs)
- Application-specific integrated circuit
- Model optimization
- Quantization
- Pruning (also known as sparsity)
- Graph and operator optimization
- Graph optimization
- Operator optimization
- Model compilers
- TensorFlow XLA
- PyTorch Glow
- Apache TVM
- Amazon SageMaker Neo
- Inference engine optimization
- Inference batching
- Enabling parallel serving sessions
- Picking a communication protocol
- Inference in large language models
- Text Generation Inference (TGI)
- DeepSpeed-Inference
- FastTransformer.
- Hands-on lab - running distributed model training with PyTorch.