Machine Learning for Imbalanced Data Tackle Imbalanced Datasets Using Machine Learning and Deep Learning Techniques

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This compr...

Full description

Bibliographic Details
Other Authors:	Abhishek, Kumar, author (author), Abdelaziz, Mounir, author
Format:	eBook
Language:	Inglés
Published:	Birmingham, England : Packt Publishing Ltd [2023]
Edition:	First edition
Subjects:	Machine learning. Data sets.
See on Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009827938806719

Table of Contents:

Cover
Copyright
Contributors
Table of Contents
Preface
Chapter 1: Introduction to Data Imbalance in Machine Learning
Technical requirements
Introduction to imbalanced datasets
Machine learning 101
What happens during model training?
Types of dataset and splits
Cross-validation
Common evaluation metrics
Confusion matrix
ROC
Precision-Recall curve
Relation between the ROC curve and PR curve
Challenges and considerations when dealing with imbalanced data
When can we have an imbalance in datasets?
Why can imbalanced data be a challenge?
When to not worry about data imbalance
Introduction to the imbalanced-learn library
General rules to follow
Summary
Questions
References
Chapter 2: Oversampling Methods
Technical requirements
What is oversampling?
Random oversampling
Problems with random oversampling
SMOTE
How SMOTE works
Problems with SMOTE
SMOTE variants
Borderline-SMOTE
ADASYN
Working of ADASYN
Categorical features and SMOTE variants (SMOTE-NC and SMOTEN)
Model performance comparison of various oversampling methods
Guidance for using various oversampling techniques
When to avoid oversampling
Oversampling in multi-class classification
Summary
Exercises
References
Chapter 3: Undersampling Methods
Technical requirements
Introducing undersampling
When to avoid undersampling the majority class
Fixed versus cleaning undersampling
Undersampling approaches
Removing examples uniformly
Random UnderSampling
ClusterCentroids
Strategies for removing noisy observations
ENN, RENN, and AllKNN
Tomek links
Neighborhood Cleaning Rule
Instance hardness threshold
Strategies for removing easy observations
Condensed Nearest Neighbors
One-sided selection.
Combining undersampling and oversampling
Model performance comparison
Summary
Exercises
References
Chapter 4: Ensemble Methods
Technical requirements
Bagging techniques for imbalanced data
UnderBagging
OverBagging
SMOTEBagging
Comparative performance of bagging methods
Boosting techniques for imbalanced data
AdaBoost
RUSBoost, SMOTEBoost, and RAMOBoost
Ensemble of ensembles
EasyEnsemble
Comparative performance of boosting methods
Model performance comparison
Summary
Questions
References
Chapter 5: Cost-Sensitive Learning
Technical requirements
The concept of Cost-Sensitive Learning
Costs and cost functions
Types of cost-sensitive learning
Difference between CSL and resampling
Problems with rebalancing techniques
Understanding costs in practice
Cost-Sensitive Learning for logistic regression
Cost-Sensitive Learning for decision trees
Cost-Sensitive Learning using scikit-learn and XGBoost models
MetaCost - making any classification model cost-sensitive
Threshold adjustment
Methods for threshold tuning
Summary
Questions
References
Chapter 6: Data Imbalance in Deep Learning
Technical requirements
A brief introduction to deep learning
Neural networks
Perceptron
Activation functions
Layers
Feedforward neural networks
Training neural networks
The effect of the learning rate on data imbalance
Image processing using Convolutional Neural Networks
Text analysis using Natural Language Processing
Data imbalance in deep learning
The impact of data imbalance on deep learning models
Overview of deep learning techniques to handle data imbalance
Multi-label classification
Summary
Questions
References
Chapter 7: Data-Level Deep Learning Methods
Technical requirements
Preparing the data.
Creating the training loop
Sampling techniques for deep learning models
Random oversampling
Dynamic sampling
Data augmentation techniques for vision
Data-level techniques for text classification
Dataset and baseline model
Document-level augmentation
Character and word-level augmentation
Discussion of other data-level deep learning methods and their key ideas
Two-phase learning
Expansive Over-Sampling
Using generative models for oversampling
DeepSMOTE
Neural style transfer
Summary
Questions
References
Chapter 8: Algorithm-Level Deep Learning Techniques
Technical requirements
Motivation for algorithm-level techniques
Weighting techniques
Using PyTorch's weight parameter
Handling textual data
Deferred re-weighting - a minor variant of the class weighting technique
Explicit loss function modification
Focal loss
Class-balanced loss
Class-dependent temperature Loss
Class-wise difficulty-balanced loss
Discussing other algorithm-based techniques
Regularization techniques
Siamese networks
Deeper neural networks
Threshold adjustment
Summary
Questions
References
Chapter 9: Hybrid Deep Learning Methods
Technical requirements
Using graph machine learning for imbalanced data
Understanding graphs
Graph machine learning
Dealing with imbalanced data
Case study - the performance of XGBoost, MLP, and a GCN on an imbalanced dataset
Hard example mining
Online Hard Example Mining
Minority class incremental rectification
Utilizing the hard sample mining technique in minority class incremental rectification
Summary
Questions
References
Chapter 10: Model Calibration
Technical requirements
Introduction to model calibration
Why bother with model calibration
Models with and without well-calibrated probabilities.
Calibration curves or reliability plot
Brier score
Expected Calibration Error
The influence of data balancing techniques on model calibration
Plotting calibration curves for a model trained on a real-world dataset
Model calibration techniques
The calibration of model scores to account for sampling
Platt's scaling
Isotonic regression
Choosing between Platt's scaling and Isotonic regression
Temperature scaling
Label smoothing
The impact of calibration on a model's performance
Summary
Questions
References
Appendix: Machine Learning Pipeline in Production
Machine learning training pipeline
Inferencing (online or batch)
Assessments
Chapter 1 - Introduction to Data Imbalance in Machine Learning
Chapter 2 - Oversampling Methods
Chapter 3 - Undersampling Methods
Chapter 4 - Ensemble Methods
Chapter 5 - Cost-Sensitive Learning
Chapter 6 - Data Imbalance in Deep Learning
Chapter 7 - Data-Level Deep Learning Methods
Chapter 8 - Algorithm-Level Deep Learning Techniques
Chapter 9 - Hybrid Deep Learning Methods
Chapter 10 - Model Calibration
Index
Other Books You May Enjoy.

Machine Learning for Imbalanced Data Tackle Imbalanced Datasets Using Machine Learning and Deep Learning Techniques

Similar Items