Python Machine Learning by Example Unlock Machine Learning Best Practices with Real-World Use Cases
The fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engine...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2024]
|
Edición: | Fourth edition |
Colección: | Expert insight.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009841739206719 |
Tabla de Contenidos:
- Cover
- Copyright
- Contributors
- Table of Contents
- Preface
- Chapter 1: Getting Started with Machine Learning and Python
- An introduction to machine learning
- Understanding why we need machine learning
- Differentiating between machine learning and automation
- Machine learning applications
- Knowing the prerequisites
- Getting started with three types of machine learning
- A brief history of the development of machine learning algorithms
- Digging into the core of machine learning
- Generalizing with data
- Overfitting, underfitting, and the bias-variance trade-off
- Overfitting
- Underfitting
- The bias-variance trade-off
- Avoiding overfitting with cross-validation
- Avoiding overfitting with regularization
- Avoiding overfitting with feature selection and dimensionality reduction
- Data preprocessing and feature engineering
- Preprocessing and exploration
- Dealing with missing values
- Label encoding
- One-hot encoding
- Dense embedding
- Scaling
- Feature engineering
- Polynomial transformation
- Binning
- Combining models
- Voting and averaging
- Bagging
- Boosting
- Stacking
- Installing software and setting up
- Setting up Python and environments
- Installing the main Python packages
- NumPy
- SciPy
- pandas
- scikit-learn
- TensorFlow
- PyTorch
- Summary
- Exercises
- Chapter 2: Building a Movie Recommendation Engine with Naïve Bayes
- Getting started with classification
- Binary classification
- Multiclass classification
- Multi-label classification
- Exploring Naïve Bayes
- Bayes' theorem by example
- The mechanics of Naïve Bayes
- Implementing Naïve Bayes
- Implementing Naïve Bayes from scratch
- Implementing Naïve Bayes with scikit-learn
- Building a movie recommender with Naïve Bayes
- Preparing the data
- Training a Naïve Bayes model.
- Evaluating classification performance
- Tuning models with cross-validation
- Summary
- Exercises
- References
- Chapter 3: Predicting Online Ad Click-Through with Tree-Based Algorithms
- A brief overview of ad click-through prediction
- Getting started with two types of data - numerical and categorical
- Exploring a decision tree from the root to the leaves
- Constructing a decision tree
- The metrics for measuring a split
- Gini Impurity
- Information gain
- Implementing a decision tree from scratch
- Implementing a decision tree with scikit-learn
- Predicting ad click-through with a decision tree
- Ensembling decision trees - random forests
- Ensembling decision trees - gradient-boosted trees
- Summary
- Exercises
- Chapter 4: Predicting Online Ad Click-Through with Logistic Regression
- Converting categorical features to numerical - one-hot encoding and ordinal encoding
- Classifying data with logistic regression
- Getting started with the logistic function
- Jumping from the logistic function to logistic regression
- Training a logistic regression model
- Training a logistic regression model using gradient descent
- Predicting ad click-through with logistic regression using gradient descent
- Training a logistic regression model using stochastic gradient descent (SGD)
- Training a logistic regression model with regularization
- Feature selection using L1 regularization
- Feature selection using random forest
- Training on large datasets with online learning
- Handling multiclass classification
- Implementing logistic regression using TensorFlow
- Summary
- Exercises
- Chapter 5: Predicting Stock Prices with Regression Algorithms
- What is regression?
- Mining stock price data
- A brief overview of the stock market and stock prices
- Getting started with feature engineering.
- Acquiring data and generating features
- Estimating with linear regression
- How does linear regression work?
- Implementing linear regression from scratch
- Implementing linear regression with scikit-learn
- Implementing linear regression with TensorFlow
- Estimating with decision tree regression
- Transitioning from classification trees to regression trees
- Implementing decision tree regression
- Implementing a regression forest
- Evaluating regression performance
- Predicting stock prices with the three regression algorithms
- Summary
- Exercises
- Chapter 6: Predicting Stock Prices with Artificial Neural Networks
- Demystifying neural networks
- Starting with a single-layer neural network
- Layers in neural networks
- Activation functions
- Backpropagation
- Adding more layers to a neural network: DL
- Building neural networks
- Implementing neural networks from scratch
- Implementing neural networks with scikit-learn
- Implementing neural networks with TensorFlow
- Implementing neural networks with PyTorch
- Picking the right activation functions
- Preventing overfitting in neural networks
- Dropout
- Early stopping
- Predicting stock prices with neural networks
- Training a simple neural network
- Fine-tuning the neural network
- Summary
- Exercises
- Chapter 7: Mining the 20 Newsgroups Dataset with Text Analysis Techniques
- How computers understand language - NLP
- What is NLP?
- The history of NLP
- NLP applications
- Touring popular NLP libraries and picking up NLP basics
- Installing famous NLP libraries
- Corpora
- Tokenization
- PoS tagging
- NER
- Stemming and lemmatization
- Semantics and topic modeling
- Getting the newsgroups data
- Exploring the newsgroups data
- Thinking about features for text data
- Counting the occurrence of each word token
- Text preprocessing.
- Dropping stop words
- Reducing inflectional and derivational forms of words
- Visualizing the newsgroups data with t-SNE
- What is dimensionality reduction?
- t-SNE for dimensionality reduction
- Representing words with dense vectors - word embedding
- Building embedding models using shallow neural networks
- Utilizing pre-trained embedding models
- Summary
- Exercises
- Chapter 8: Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
- Learning without guidance - unsupervised learning
- Getting started with k-means clustering
- How does k-means clustering work?
- Implementing k-means from scratch
- Implementing k-means with scikit-learn
- Choosing the value of k
- Clustering newsgroups dataset
- Clustering newsgroups data using k-means
- Describing the clusters using GPT
- Discovering underlying topics in newsgroups
- Topic modeling using NMF
- Topic modeling using LDA
- Summary
- Exercises
- Chapter 9: Recognizing Faces with Support Vector Machine
- Finding the separating boundary with SVM
- Scenario 1 - identifying a separating hyperplane
- Scenario 2 - determining the optimal hyperplane
- Scenario 3 - handling outliers
- Implementing SVM
- Scenario 4 - dealing with more than two classes
- One-vs-rest
- One-vs-one
- Multiclass cases in scikit-learn
- Scenario 5 - solving linearly non-separable problems with kernels
- Choosing between linear and RBF kernels
- Classifying face images with SVM
- Exploring the face image dataset
- Building an SVM-based image classifier
- Boosting image classification performance with PCA
- Estimating with support vector regression
- Implementing SVR
- Summary
- Exercises
- Chapter 10: Machine Learning Best Practices
- Machine learning solution workflow
- Best practices in the data preparation stage.
- Best practice 1 - Completely understanding the project goal
- Best practice 2 - Collecting all fields that are relevant
- Best practice 3 - Maintaining the consistency and normalization of field values
- Best practice 4 - Dealing with missing data
- Best practice 5 - Storing large-scale data
- Best practices in the training set generation stage
- Best practice 6 - Identifying categorical features with numerical values
- Best practice 7 - Deciding whether to encode categorical features
- Best practice 8 - Deciding whether to select features and, if so, how to do so
- Best practice 9 - Deciding whether to reduce dimensionality and, if so, how to do so
- Best practice 10 - Deciding whether to rescale features
- Best practice 11 - Performing feature engineering with domain expertise
- Best practice 12 - Performing feature engineering without domain expertise
- Binarization and discretization
- Interaction
- Polynomial transformation
- Best practice 13 - Documenting how each feature is generated
- Best practice 14 - Extracting features from text data
- tf and tf-idf
- Word embedding
- Word2Vec embedding
- Best practices in the model training, evaluation, and selection stage
- Best practice 15 - Choosing the right algorithm(s) to start with
- Naïve Bayes
- Logistic regression
- SVM
- Random forest (or decision tree)
- Neural networks
- Best practice 16 - Reducing overfitting
- Best practice 17 - Diagnosing overfitting and underfitting
- Best practice 18 - Modeling on large-scale datasets
- Best practices in the deployment and monitoring stage
- Best practice 19 - Saving, loading, and reusing models
- Saving and restoring models using pickle
- Saving and restoring models in TensorFlow
- Saving and restoring models in PyTorch
- Best practice 20 - Monitoring model performance
- Best practice 21 - Updating models regularly.
- Summary.