Python Machine Learning by Example Unlock Machine Learning Best Practices with Real-World Use Cases

The fourth edition of Python Machine Learning By Example is a comprehensive guide for beginners and experienced machine learning practitioners who want to learn more advanced techniques, such as multimodal modeling. Written by experienced machine learning author and ex-Google machine learning engine...

Descripción completa

Detalles Bibliográficos
Otros Autores: Liu, Yuxi author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing [2024]
Edición:Fourth edition
Colección:Expert insight.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009841739206719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Getting Started with Machine Learning and Python
  • An introduction to machine learning
  • Understanding why we need machine learning
  • Differentiating between machine learning and automation
  • Machine learning applications
  • Knowing the prerequisites
  • Getting started with three types of machine learning
  • A brief history of the development of machine learning algorithms
  • Digging into the core of machine learning
  • Generalizing with data
  • Overfitting, underfitting, and the bias-variance trade-off
  • Overfitting
  • Underfitting
  • The bias-variance trade-off
  • Avoiding overfitting with cross-validation
  • Avoiding overfitting with regularization
  • Avoiding overfitting with feature selection and dimensionality reduction
  • Data preprocessing and feature engineering
  • Preprocessing and exploration
  • Dealing with missing values
  • Label encoding
  • One-hot encoding
  • Dense embedding
  • Scaling
  • Feature engineering
  • Polynomial transformation
  • Binning
  • Combining models
  • Voting and averaging
  • Bagging
  • Boosting
  • Stacking
  • Installing software and setting up
  • Setting up Python and environments
  • Installing the main Python packages
  • NumPy
  • SciPy
  • pandas
  • scikit-learn
  • TensorFlow
  • PyTorch
  • Summary
  • Exercises
  • Chapter 2: Building a Movie Recommendation Engine with Naïve Bayes
  • Getting started with classification
  • Binary classification
  • Multiclass classification
  • Multi-label classification
  • Exploring Naïve Bayes
  • Bayes' theorem by example
  • The mechanics of Naïve Bayes
  • Implementing Naïve Bayes
  • Implementing Naïve Bayes from scratch
  • Implementing Naïve Bayes with scikit-learn
  • Building a movie recommender with Naïve Bayes
  • Preparing the data
  • Training a Naïve Bayes model.
  • Evaluating classification performance
  • Tuning models with cross-validation
  • Summary
  • Exercises
  • References
  • Chapter 3: Predicting Online Ad Click-Through with Tree-Based Algorithms
  • A brief overview of ad click-through prediction
  • Getting started with two types of data - numerical and categorical
  • Exploring a decision tree from the root to the leaves
  • Constructing a decision tree
  • The metrics for measuring a split
  • Gini Impurity
  • Information gain
  • Implementing a decision tree from scratch
  • Implementing a decision tree with scikit-learn
  • Predicting ad click-through with a decision tree
  • Ensembling decision trees - random forests
  • Ensembling decision trees - gradient-boosted trees
  • Summary
  • Exercises
  • Chapter 4: Predicting Online Ad Click-Through with Logistic Regression
  • Converting categorical features to numerical - one-hot encoding and ordinal encoding
  • Classifying data with logistic regression
  • Getting started with the logistic function
  • Jumping from the logistic function to logistic regression
  • Training a logistic regression model
  • Training a logistic regression model using gradient descent
  • Predicting ad click-through with logistic regression using gradient descent
  • Training a logistic regression model using stochastic gradient descent (SGD)
  • Training a logistic regression model with regularization
  • Feature selection using L1 regularization
  • Feature selection using random forest
  • Training on large datasets with online learning
  • Handling multiclass classification
  • Implementing logistic regression using TensorFlow
  • Summary
  • Exercises
  • Chapter 5: Predicting Stock Prices with Regression Algorithms
  • What is regression?
  • Mining stock price data
  • A brief overview of the stock market and stock prices
  • Getting started with feature engineering.
  • Acquiring data and generating features
  • Estimating with linear regression
  • How does linear regression work?
  • Implementing linear regression from scratch
  • Implementing linear regression with scikit-learn
  • Implementing linear regression with TensorFlow
  • Estimating with decision tree regression
  • Transitioning from classification trees to regression trees
  • Implementing decision tree regression
  • Implementing a regression forest
  • Evaluating regression performance
  • Predicting stock prices with the three regression algorithms
  • Summary
  • Exercises
  • Chapter 6: Predicting Stock Prices with Artificial Neural Networks
  • Demystifying neural networks
  • Starting with a single-layer neural network
  • Layers in neural networks
  • Activation functions
  • Backpropagation
  • Adding more layers to a neural network: DL
  • Building neural networks
  • Implementing neural networks from scratch
  • Implementing neural networks with scikit-learn
  • Implementing neural networks with TensorFlow
  • Implementing neural networks with PyTorch
  • Picking the right activation functions
  • Preventing overfitting in neural networks
  • Dropout
  • Early stopping
  • Predicting stock prices with neural networks
  • Training a simple neural network
  • Fine-tuning the neural network
  • Summary
  • Exercises
  • Chapter 7: Mining the 20 Newsgroups Dataset with Text Analysis Techniques
  • How computers understand language - NLP
  • What is NLP?
  • The history of NLP
  • NLP applications
  • Touring popular NLP libraries and picking up NLP basics
  • Installing famous NLP libraries
  • Corpora
  • Tokenization
  • PoS tagging
  • NER
  • Stemming and lemmatization
  • Semantics and topic modeling
  • Getting the newsgroups data
  • Exploring the newsgroups data
  • Thinking about features for text data
  • Counting the occurrence of each word token
  • Text preprocessing.
  • Dropping stop words
  • Reducing inflectional and derivational forms of words
  • Visualizing the newsgroups data with t-SNE
  • What is dimensionality reduction?
  • t-SNE for dimensionality reduction
  • Representing words with dense vectors - word embedding
  • Building embedding models using shallow neural networks
  • Utilizing pre-trained embedding models
  • Summary
  • Exercises
  • Chapter 8: Discovering Underlying Topics in the Newsgroups Dataset with Clustering and Topic Modeling
  • Learning without guidance - unsupervised learning
  • Getting started with k-means clustering
  • How does k-means clustering work?
  • Implementing k-means from scratch
  • Implementing k-means with scikit-learn
  • Choosing the value of k
  • Clustering newsgroups dataset
  • Clustering newsgroups data using k-means
  • Describing the clusters using GPT
  • Discovering underlying topics in newsgroups
  • Topic modeling using NMF
  • Topic modeling using LDA
  • Summary
  • Exercises
  • Chapter 9: Recognizing Faces with Support Vector Machine
  • Finding the separating boundary with SVM
  • Scenario 1 - identifying a separating hyperplane
  • Scenario 2 - determining the optimal hyperplane
  • Scenario 3 - handling outliers
  • Implementing SVM
  • Scenario 4 - dealing with more than two classes
  • One-vs-rest
  • One-vs-one
  • Multiclass cases in scikit-learn
  • Scenario 5 - solving linearly non-separable problems with kernels
  • Choosing between linear and RBF kernels
  • Classifying face images with SVM
  • Exploring the face image dataset
  • Building an SVM-based image classifier
  • Boosting image classification performance with PCA
  • Estimating with support vector regression
  • Implementing SVR
  • Summary
  • Exercises
  • Chapter 10: Machine Learning Best Practices
  • Machine learning solution workflow
  • Best practices in the data preparation stage.
  • Best practice 1 - Completely understanding the project goal
  • Best practice 2 - Collecting all fields that are relevant
  • Best practice 3 - Maintaining the consistency and normalization of field values
  • Best practice 4 - Dealing with missing data
  • Best practice 5 - Storing large-scale data
  • Best practices in the training set generation stage
  • Best practice 6 - Identifying categorical features with numerical values
  • Best practice 7 - Deciding whether to encode categorical features
  • Best practice 8 - Deciding whether to select features and, if so, how to do so
  • Best practice 9 - Deciding whether to reduce dimensionality and, if so, how to do so
  • Best practice 10 - Deciding whether to rescale features
  • Best practice 11 - Performing feature engineering with domain expertise
  • Best practice 12 - Performing feature engineering without domain expertise
  • Binarization and discretization
  • Interaction
  • Polynomial transformation
  • Best practice 13 - Documenting how each feature is generated
  • Best practice 14 - Extracting features from text data
  • tf and tf-idf
  • Word embedding
  • Word2Vec embedding
  • Best practices in the model training, evaluation, and selection stage
  • Best practice 15 - Choosing the right algorithm(s) to start with
  • Naïve Bayes
  • Logistic regression
  • SVM
  • Random forest (or decision tree)
  • Neural networks
  • Best practice 16 - Reducing overfitting
  • Best practice 17 - Diagnosing overfitting and underfitting
  • Best practice 18 - Modeling on large-scale datasets
  • Best practices in the deployment and monitoring stage
  • Best practice 19 - Saving, loading, and reusing models
  • Saving and restoring models using pickle
  • Saving and restoring models in TensorFlow
  • Saving and restoring models in PyTorch
  • Best practice 20 - Monitoring model performance
  • Best practice 21 - Updating models regularly.
  • Summary.