Hands-on automated machine learning a beginner's guide to building automated machine learning systems using AutoML and Python
Automate data and model pipelines for faster machine learning applications About This Book Build automated modules for different machine learning components Understand each component of a machine learning pipeline in depth Learn to use different open source AutoML and feature engineering platforms W...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham ; Mumbai :
Packt Publishing
2018.
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009631490806719 |
Tabla de Contenidos:
- Cover
- Copyright and Credits
- Packt Upsell
- Contributors
- Table of Contents
- Preface
- Chapter 1: Introduction to AutoML
- Scope of machine learning
- What is AutoML?
- Why use AutoML and how does it help?
- When do you automate ML?
- What will you learn?
- Core components of AutoML systems
- Automated feature preprocessing
- Automated algorithm selection
- Hyperparameter optimization
- Building prototype subsystems for each component
- Putting it all together as an end-to-end AutoML system
- Overview of AutoML libraries
- Featuretools
- Auto-sklearn
- MLBox
- TPOT
- Summary
- Chapter 2: Introduction to Machine Learning Using Python
- Technical requirements
- Machine learning
- Machine learning process
- Supervised learning
- Unsupervised learning
- Linear regression
- What is linear regression?
- Working of OLS regression
- Assumptions of OLS
- Where is linear regression used?
- By which method can linear regression be implemented?
- Important evaluation metrics - regression algorithms
- Logistic regression
- What is logistic regression?
- Where is logistic regression used?
- By which method can logistic regression be implemented?
- Important evaluation metrics - classification algorithms
- Decision trees
- What are decision trees?
- Where are decision trees used?
- By which method can decision trees be implemented?
- Support Vector Machines
- What is SVM?
- Where is SVM used?
- By which method can SVM be implemented?
- k-Nearest Neighbors
- What is k-Nearest Neighbors?
- Where is KNN used?
- By which method can KNN be implemented?
- Ensemble methods
- What are ensemble models?
- Bagging
- Boosting
- Stacking/blending
- Comparing the results of classifiers
- Cross-validation
- Clustering
- What is clustering?
- Where is clustering used?.
- By which method can clustering be implemented?
- Hierarchical clustering
- Partitioning clustering (KMeans)
- Summary
- Chapter 3: Data Preprocessing
- Technical requirements
- Data transformation
- Numerical data transformation
- Scaling
- Missing values
- Outliers
- Detecting and treating univariate outliers
- Inter-quartile range
- Filtering values
- Winsorizing
- Trimming
- Detecting and treating multivariate outliers
- Binning
- Log and power transformations
- Categorical data transformation
- Encoding
- Missing values for categorical data transformation
- Text preprocessing
- Feature selection
- Excluding features with low variance
- Univariate feature selection
- Recursive feature elimination
- Feature selection using random forest
- Feature selection using dimensionality reduction
- Principal Component Analysis
- Feature generation
- Summary
- Chapter 4: Automated Algorithm Selection
- Technical requirements
- Computational complexity
- Big O notation
- Differences in training and scoring time
- Simple measure of training and scoring time
- Code profiling in Python
- Visualizing performance statistics
- Implementing k-NN from scratch
- Profiling your Python script line by line
- Linearity versus non-linearity
- Drawing decision boundaries
- Decision boundary of logistic regression
- The decision boundary of random forest
- Commonly used machine learning algorithms
- Necessary feature transformations
- Supervised ML
- Default configuration of auto-sklearn
- Finding the best ML pipeline for product line prediction
- Finding the best machine learning pipeline for network anomaly detection
- Unsupervised AutoML
- Commonly used clustering algorithms
- Creating sample datasets with sklearn
- K-means algorithm in action
- The DBSCAN algorithm in action.
- Agglomerative clustering algorithm in action
- Simple automation of unsupervised learning
- Visualizing high-dimensional datasets
- Principal Component Analysis in action
- t-SNE in action
- Adding simple components together to improve the pipeline
- Summary
- Chapter 5: Hyperparameter Optimization
- Technical requirements
- Hyperparameters
- Warm start
- Bayesian-based hyperparameter tuning
- An example system
- Summary
- Chapter 6: Creating AutoML Pipelines
- Technical requirements
- An introduction to machine learning pipelines
- A simple pipeline
- FunctionTransformer
- A complex pipeline
- Summary
- Chapter 7: Dive into Deep Learning
- Technical requirements
- Overview of neural networks
- Neuron
- Activation functions
- The step function
- The sigmoid function
- The ReLU function
- The tanh function
- A feed-forward neural network using Keras
- Autoencoders
- Convolutional Neural Networks
- Why CNN?
- What is convolution?
- What are filters?
- The convolution layer
- The ReLU layer
- The pooling layer
- The fully connected layer
- Summary
- Chapter 8: Critical Aspects of ML and Data Science Projects
- Machine learning as a search
- Trade-offs in machine learning
- Engagement model for a typical data science project
- The phases of an engagement model
- Business understanding
- Data understanding
- Data preparation
- Modeling
- Evaluation
- Deployment
- Summary
- Other Books You May Enjoy
- Index.