Human-in-the-loop machine learning active learning and annotation for human-centered AI
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Shelter Island, New York :
Manning Publications
[2021]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009634718706719 |
Tabla de Contenidos:
- Intro
- inside front cover
- Human-in-the-Loop Machine Learning
- Copyright
- brief contents
- contents
- front matter
- foreword
- preface
- acknowledgments
- about this book
- Who should read this book
- How this book is organized: A road map
- About the code
- liveBook discussion forum
- Other online resources
- about the author
- Part 1 First steps
- 1 Introduction to human-in-the-loop machine learning
- 1.1 The basic principles of human-in-the-loop machine learning
- 1.2 Introducing annotation
- 1.2.1 Simple and more complicated annotation strategies
- 1.2.2 Plugging the gap in data science knowledge
- 1.2.3 Quality human annotation: Why is it hard?
- 1.3 Introducing active learning: Improving the speed and reducing the cost of training data
- 1.3.1 Three broad active learning sampling strategies: Uncertainty, diversity, and random
- 1.3.2 What is a random selection of evaluation data?
- 1.3.3 When to use active learning
- 1.4 Machine learning and human-computer interaction
- 1.4.1 User interfaces: How do you create training data?
- 1.4.2 Priming: What can influence human perception?
- 1.4.3 The pros and cons of creating labels by evaluating machine learning predictions
- 1.4.4 Basic principles for designing annotation interfaces
- 1.5 Machine-learning-assisted humans vs. human-assisted machine learning
- 1.6 Transfer learning to kick-start your models
- 1.6.1 Transfer learning in computer vision
- 1.6.2 Transfer learning in NLP
- 1.7 What to expect in this text
- Summary
- 2 Getting started with human-in-the-loop machine learning
- 2.1 Beyond hacktive learning: Your first active learning algorithm
- 2.2 The architecture of your first system
- 2.3 Interpreting model predictions and data to support active learning
- 2.3.1 Confidence ranking
- 2.3.2 Identifying outliers.
- 2.3.3 What to expect as you iterate
- 2.4 Building an interface to get human labels
- 2.4.1 A simple interface for labeling text
- 2.4.2 Managing machine learning data
- 2.5 Deploying your first human-in-the-loop machine learning system
- 2.5.1 Always get your evaluation data first
- 2.5.2 Every data point gets a chance
- 2.5.3 Select the right strategies for your data
- 2.5.4 Retrain the model and iterate
- Summary
- Part 2 Active learning
- 3 Uncertainty sampling
- 3.1 Interpreting uncertainty in a machine learning model
- 3.1.1 Why look for uncertainty in your model?
- 3.1.2 Softmax and probability distributions
- 3.1.3 Interpreting the success of active learning
- 3.2 Algorithms for uncertainty sampling
- 3.2.1 Least confidence sampling
- 3.2.2 Margin of confidence sampling
- 3.2.3 Ratio sampling
- 3.2.4 Entropy (classification entropy)
- 3.2.5 A deep dive on entropy
- 3.3 Identifying when different types of models are confused
- 3.3.1 Uncertainty sampling with logistic regression and MaxEnt models
- 3.3.2 Uncertainty sampling with SVMs
- 3.3.3 Uncertainty sampling with Bayesian models
- 3.3.4 Uncertainty sampling with decision trees and random forests
- 3.4 Measuring uncertainty across multiple predictions
- 3.4.1 Uncertainty sampling with ensemble models
- 3.4.2 Query by Committee and dropouts
- 3.4.3 The difference between aleatoric and epistemic uncertainty
- 3.4.4 Multilabeled and continuous value classification
- 3.5 Selecting the right number of items for human review
- 3.5.1 Budget-constrained uncertainty sampling
- 3.5.2 Time-constrained uncertainty sampling
- 3.5.3 When do I stop if I'm not time- or budget-constrained?
- 3.6 Evaluating the success of active learning
- 3.6.1 Do I need new test data?
- 3.6.2 Do I need new validation data?
- 3.7 Uncertainty sampling cheat sheet
- 3.8 Further reading.
- 3.8.1 Further reading for least confidence sampling
- 3.8.2 Further reading for margin of confidence sampling
- 3.8.3 Further reading for ratio of confidence sampling
- 3.8.4 Further reading for entropy-based sampling
- 3.8.5 Further reading for other machine learning models
- 3.8.6 Further reading for ensemble-based uncertainty sampling
- Summary
- 4 Diversity sampling
- 4.1 Knowing what you don't know: Identifying gaps in your model's knowledge
- 4.1.1 Example data for diversity sampling
- 4.1.2 Interpreting neural models for diversity sampling
- 4.1.3 Getting information from hidden layers in PyTorch
- 4.2 Model-based outlier sampling
- 4.2.1 Use validation data to rank activations
- 4.2.2 Which layers should I use to calculate model-based outliers?
- 4.2.3 The limitations of model-based outliers
- 4.3 Cluster-based sampling
- 4.3.1 Cluster members, centroids, and outliers
- 4.3.2 Any clustering algorithm in the universe
- 4.3.3 K-means clustering with cosine similarity
- 4.3.4 Reduced feature dimensions via embeddings or PCA
- 4.3.5 Other clustering algorithms
- 4.4 Representative sampling
- 4.4.1 Representative sampling is rarely used in isolation
- 4.4.2 Simple representative sampling
- 4.4.3 Adaptive representative sampling
- 4.5 Sampling for real-world diversity
- 4.5.1 Common problems in training data diversity
- 4.5.2 Stratified sampling to ensure diversity of demographics
- 4.5.3 Represented and representative: Which matters?
- 4.5.4 Per-demographic accuracy
- 4.5.5 Limitations of sampling for real-world diversity
- 4.6 Diversity sampling with different types of models
- 4.6.1 Model-based outliers with different types of models
- 4.6.2 Clustering with different types of models
- 4.6.3 Representative sampling with different types of models.
- 4.6.4 Sampling for real-world diversity with different types of models
- 4.7 Diversity sampling cheat sheet
- 4.8 Further reading
- 4.8.1 Further reading for model-based outliers
- 4.8.2 Further reading for cluster-based sampling
- 4.8.3 Further reading for representative sampling
- 4.8.4 Further reading for sampling for real-world diversity
- Summary
- 5 Advanced active learning
- 5.1 Combining uncertainty sampling and diversity sampling
- 5.1.1 Least confidence sampling with cluster-based sampling
- 5.1.2 Uncertainty sampling with model-based outliers
- 5.1.3 Uncertainty sampling with model-based outliers and clustering
- 5.1.4 Representative sampling cluster-based sampling
- 5.1.5 Sampling from the highest-entropy cluster
- 5.1.6 Other combinations of active learning strategies
- 5.1.7 Combining active learning scores
- 5.1.8 Expected error reduction sampling
- 5.2 Active transfer learning for uncertainty sampling
- 5.2.1 Making your model predict its own errors
- 5.2.2 Implementing active transfer learning
- 5.2.3 Active transfer learning with more layers
- 5.2.4 The pros and cons of active transfer learning
- 5.3 Applying active transfer learning to representative sampling
- 5.3.1 Making your model predict what it doesn't know
- 5.3.2 Active transfer learning for adaptive representative sampling
- 5.3.3 The pros and cons of active transfer learning for representative sampling
- 5.4 Active transfer learning for adaptive sampling
- 5.4.1 Making uncertainty sampling adaptive by predicting uncertainty
- 5.4.2 The pros and cons of ATLAS
- 5.5 Advanced active learning cheat sheets
- 5.6 Further reading for active transfer learning
- Summary
- 6 Applying active learning to different machine learning tasks
- 6.1 Applying active learning to object detection.
- 6.1.1 Accuracy for object detection: Label confidence and localization
- 6.1.2 Uncertainty sampling for label confidence and localization in object detection
- 6.1.3 Diversity sampling for label confidence and localization in object detection
- 6.1.4 Active transfer learning for object detection
- 6.1.5 Setting a low object detection threshold to avoid perpetuating bias
- 6.1.6 Creating training data samples for representative sampling that are similar to your predictions
- 6.1.7 Sampling for image-level diversity in object detection
- 6.1.8 Considering tighter masks when using polygons
- 6.2 Applying active learning to semantic segmentation
- 6.2.1 Accuracy for semantic segmentation
- 6.2.2 Uncertainty sampling for semantic segmentation
- 6.2.3 Diversity sampling for semantic segmentation
- 6.2.4 Active transfer learning for semantic segmentation
- 6.2.5 Sampling for image-level diversity in semantic segmentation
- 6.3 Applying active learning to sequence labeling
- 6.3.1 Accuracy for sequence labeling
- 6.3.2 Uncertainty sampling for sequence labeling
- 6.3.3 Diversity sampling for sequence labeling
- 6.3.4 Active transfer learning for sequence labeling
- 6.3.5 Stratified sampling by confidence and tokens
- 6.3.6 Create training data samples for representative sampling that are similar to your predictions
- 6.3.7 Full-sequence labeling
- 6.3.8 Sampling for document-level diversity in sequence labeling
- 6.4 Applying active learning to language generation
- 6.4.1 Calculating accuracy for language generation systems
- 6.4.2 Uncertainty sampling for language generation
- 6.4.3 Diversity sampling for language generation
- 6.4.4 Active transfer learning for language generation
- 6.5 Applying active learning to other machine learning tasks
- 6.5.1 Active learning for information retrieval
- 6.5.2 Active learning for video.
- 6.5.3 Active learning for speech.