Human-in-the-loop machine learning active learning and annotation for human-centered AI

Detalles Bibliográficos
Otros Autores: Monarch, Robert, author (author), Manning, Christopher D., writer of foreword (writer of foreword)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Shelter Island, New York : Manning Publications [2021]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009634718706719
Tabla de Contenidos:
  • Intro
  • inside front cover
  • Human-in-the-Loop Machine Learning
  • Copyright
  • brief contents
  • contents
  • front matter
  • foreword
  • preface
  • acknowledgments
  • about this book
  • Who should read this book
  • How this book is organized: A road map
  • About the code
  • liveBook discussion forum
  • Other online resources
  • about the author
  • Part 1 First steps
  • 1 Introduction to human-in-the-loop machine learning
  • 1.1 The basic principles of human-in-the-loop machine learning
  • 1.2 Introducing annotation
  • 1.2.1 Simple and more complicated annotation strategies
  • 1.2.2 Plugging the gap in data science knowledge
  • 1.2.3 Quality human annotation: Why is it hard?
  • 1.3 Introducing active learning: Improving the speed and reducing the cost of training data
  • 1.3.1 Three broad active learning sampling strategies: Uncertainty, diversity, and random
  • 1.3.2 What is a random selection of evaluation data?
  • 1.3.3 When to use active learning
  • 1.4 Machine learning and human-computer interaction
  • 1.4.1 User interfaces: How do you create training data?
  • 1.4.2 Priming: What can influence human perception?
  • 1.4.3 The pros and cons of creating labels by evaluating machine learning predictions
  • 1.4.4 Basic principles for designing annotation interfaces
  • 1.5 Machine-learning-assisted humans vs. human-assisted machine learning
  • 1.6 Transfer learning to kick-start your models
  • 1.6.1 Transfer learning in computer vision
  • 1.6.2 Transfer learning in NLP
  • 1.7 What to expect in this text
  • Summary
  • 2 Getting started with human-in-the-loop machine learning
  • 2.1 Beyond hacktive learning: Your first active learning algorithm
  • 2.2 The architecture of your first system
  • 2.3 Interpreting model predictions and data to support active learning
  • 2.3.1 Confidence ranking
  • 2.3.2 Identifying outliers.
  • 2.3.3 What to expect as you iterate
  • 2.4 Building an interface to get human labels
  • 2.4.1 A simple interface for labeling text
  • 2.4.2 Managing machine learning data
  • 2.5 Deploying your first human-in-the-loop machine learning system
  • 2.5.1 Always get your evaluation data first
  • 2.5.2 Every data point gets a chance
  • 2.5.3 Select the right strategies for your data
  • 2.5.4 Retrain the model and iterate
  • Summary
  • Part 2 Active learning
  • 3 Uncertainty sampling
  • 3.1 Interpreting uncertainty in a machine learning model
  • 3.1.1 Why look for uncertainty in your model?
  • 3.1.2 Softmax and probability distributions
  • 3.1.3 Interpreting the success of active learning
  • 3.2 Algorithms for uncertainty sampling
  • 3.2.1 Least confidence sampling
  • 3.2.2 Margin of confidence sampling
  • 3.2.3 Ratio sampling
  • 3.2.4 Entropy (classification entropy)
  • 3.2.5 A deep dive on entropy
  • 3.3 Identifying when different types of models are confused
  • 3.3.1 Uncertainty sampling with logistic regression and MaxEnt models
  • 3.3.2 Uncertainty sampling with SVMs
  • 3.3.3 Uncertainty sampling with Bayesian models
  • 3.3.4 Uncertainty sampling with decision trees and random forests
  • 3.4 Measuring uncertainty across multiple predictions
  • 3.4.1 Uncertainty sampling with ensemble models
  • 3.4.2 Query by Committee and dropouts
  • 3.4.3 The difference between aleatoric and epistemic uncertainty
  • 3.4.4 Multilabeled and continuous value classification
  • 3.5 Selecting the right number of items for human review
  • 3.5.1 Budget-constrained uncertainty sampling
  • 3.5.2 Time-constrained uncertainty sampling
  • 3.5.3 When do I stop if I'm not time- or budget-constrained?
  • 3.6 Evaluating the success of active learning
  • 3.6.1 Do I need new test data?
  • 3.6.2 Do I need new validation data?
  • 3.7 Uncertainty sampling cheat sheet
  • 3.8 Further reading.
  • 3.8.1 Further reading for least confidence sampling
  • 3.8.2 Further reading for margin of confidence sampling
  • 3.8.3 Further reading for ratio of confidence sampling
  • 3.8.4 Further reading for entropy-based sampling
  • 3.8.5 Further reading for other machine learning models
  • 3.8.6 Further reading for ensemble-based uncertainty sampling
  • Summary
  • 4 Diversity sampling
  • 4.1 Knowing what you don't know: Identifying gaps in your model's knowledge
  • 4.1.1 Example data for diversity sampling
  • 4.1.2 Interpreting neural models for diversity sampling
  • 4.1.3 Getting information from hidden layers in PyTorch
  • 4.2 Model-based outlier sampling
  • 4.2.1 Use validation data to rank activations
  • 4.2.2 Which layers should I use to calculate model-based outliers?
  • 4.2.3 The limitations of model-based outliers
  • 4.3 Cluster-based sampling
  • 4.3.1 Cluster members, centroids, and outliers
  • 4.3.2 Any clustering algorithm in the universe
  • 4.3.3 K-means clustering with cosine similarity
  • 4.3.4 Reduced feature dimensions via embeddings or PCA
  • 4.3.5 Other clustering algorithms
  • 4.4 Representative sampling
  • 4.4.1 Representative sampling is rarely used in isolation
  • 4.4.2 Simple representative sampling
  • 4.4.3 Adaptive representative sampling
  • 4.5 Sampling for real-world diversity
  • 4.5.1 Common problems in training data diversity
  • 4.5.2 Stratified sampling to ensure diversity of demographics
  • 4.5.3 Represented and representative: Which matters?
  • 4.5.4 Per-demographic accuracy
  • 4.5.5 Limitations of sampling for real-world diversity
  • 4.6 Diversity sampling with different types of models
  • 4.6.1 Model-based outliers with different types of models
  • 4.6.2 Clustering with different types of models
  • 4.6.3 Representative sampling with different types of models.
  • 4.6.4 Sampling for real-world diversity with different types of models
  • 4.7 Diversity sampling cheat sheet
  • 4.8 Further reading
  • 4.8.1 Further reading for model-based outliers
  • 4.8.2 Further reading for cluster-based sampling
  • 4.8.3 Further reading for representative sampling
  • 4.8.4 Further reading for sampling for real-world diversity
  • Summary
  • 5 Advanced active learning
  • 5.1 Combining uncertainty sampling and diversity sampling
  • 5.1.1 Least confidence sampling with cluster-based sampling
  • 5.1.2 Uncertainty sampling with model-based outliers
  • 5.1.3 Uncertainty sampling with model-based outliers and clustering
  • 5.1.4 Representative sampling cluster-based sampling
  • 5.1.5 Sampling from the highest-entropy cluster
  • 5.1.6 Other combinations of active learning strategies
  • 5.1.7 Combining active learning scores
  • 5.1.8 Expected error reduction sampling
  • 5.2 Active transfer learning for uncertainty sampling
  • 5.2.1 Making your model predict its own errors
  • 5.2.2 Implementing active transfer learning
  • 5.2.3 Active transfer learning with more layers
  • 5.2.4 The pros and cons of active transfer learning
  • 5.3 Applying active transfer learning to representative sampling
  • 5.3.1 Making your model predict what it doesn't know
  • 5.3.2 Active transfer learning for adaptive representative sampling
  • 5.3.3 The pros and cons of active transfer learning for representative sampling
  • 5.4 Active transfer learning for adaptive sampling
  • 5.4.1 Making uncertainty sampling adaptive by predicting uncertainty
  • 5.4.2 The pros and cons of ATLAS
  • 5.5 Advanced active learning cheat sheets
  • 5.6 Further reading for active transfer learning
  • Summary
  • 6 Applying active learning to different machine learning tasks
  • 6.1 Applying active learning to object detection.
  • 6.1.1 Accuracy for object detection: Label confidence and localization
  • 6.1.2 Uncertainty sampling for label confidence and localization in object detection
  • 6.1.3 Diversity sampling for label confidence and localization in object detection
  • 6.1.4 Active transfer learning for object detection
  • 6.1.5 Setting a low object detection threshold to avoid perpetuating bias
  • 6.1.6 Creating training data samples for representative sampling that are similar to your predictions
  • 6.1.7 Sampling for image-level diversity in object detection
  • 6.1.8 Considering tighter masks when using polygons
  • 6.2 Applying active learning to semantic segmentation
  • 6.2.1 Accuracy for semantic segmentation
  • 6.2.2 Uncertainty sampling for semantic segmentation
  • 6.2.3 Diversity sampling for semantic segmentation
  • 6.2.4 Active transfer learning for semantic segmentation
  • 6.2.5 Sampling for image-level diversity in semantic segmentation
  • 6.3 Applying active learning to sequence labeling
  • 6.3.1 Accuracy for sequence labeling
  • 6.3.2 Uncertainty sampling for sequence labeling
  • 6.3.3 Diversity sampling for sequence labeling
  • 6.3.4 Active transfer learning for sequence labeling
  • 6.3.5 Stratified sampling by confidence and tokens
  • 6.3.6 Create training data samples for representative sampling that are similar to your predictions
  • 6.3.7 Full-sequence labeling
  • 6.3.8 Sampling for document-level diversity in sequence labeling
  • 6.4 Applying active learning to language generation
  • 6.4.1 Calculating accuracy for language generation systems
  • 6.4.2 Uncertainty sampling for language generation
  • 6.4.3 Diversity sampling for language generation
  • 6.4.4 Active transfer learning for language generation
  • 6.5 Applying active learning to other machine learning tasks
  • 6.5.1 Active learning for information retrieval
  • 6.5.2 Active learning for video.
  • 6.5.3 Active learning for speech.