Transformers for natural language processing build, train, and fine-tuning deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3
Transformers are a game-changer for natural language understanding (NLU) and have become one of the pillars of artificial intelligence. Transformers for Natural Language Processing, 2nd Edition, investigates deep learning for machine translations, speech-to-text, text-to-speech, language modeling, q...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing, Limited
[2022]
|
Edición: | 2nd ed |
Colección: | Expert insight.
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009655514606719 |
Tabla de Contenidos:
- Intro
- Copyright
- Foreword
- Contributors
- Table of Contents
- Preface
- Chapter 1: What are Transformers?
- The ecosystem of transformers
- Industry 4.0
- Foundation models
- Is programming becoming a sub-domain of NLP?
- The future of artificial intelligence specialists
- Optimizing NLP models with transformers
- The background of transformers
- What resources should we use?
- The rise of Transformer 4.0 seamless APIs
- Choosing ready-to-use API-driven libraries
- Choosing a Transformer Model
- The role of Industry 4.0 artificial intelligence specialists
- Summary
- Questions
- References
- Chapter 2: Getting Started with the Architecture of the Transformer Model
- The rise of the Transformer: Attention is All You Need
- The encoder stack
- Input embedding
- Positional encoding
- Sublayer 1: Multi-head attention
- Sublayer 2: Feedforward network
- The decoder stack
- Output embedding and position encoding
- The attention layers
- The FFN sublayer, the post-LN, and the linear layer
- Training and performance
- Tranformer models in Hugging Face
- Summary
- Questions
- References
- Chapter 3: Fine-Tuning BERT Models
- The architecture of BERT
- The encoder stack
- Preparing the pretraining input environment
- Pretraining and fine-tuning a BERT model
- Fine-tuning BERT
- Hardware constraints
- Installing the Hugging Face PyTorch interface for BERT
- Importing the modules
- Specifying CUDA as the device for torch
- Loading the dataset
- Creating sentences, label lists, and adding BERT tokens
- Activating the BERT tokenizer
- Processing the data
- Creating attention masks
- Splitting the data into training and validation sets
- Converting all the data into torch tensors
- Selecting a batch size and creating an iterator
- BERT model configuration.
- Loading the Hugging Face BERT uncased base model
- Optimizer grouped parameters
- The hyperparameters for the training loop
- The training loop
- Training evaluation
- Predicting and evaluating using the holdout dataset
- Evaluating using the Matthews Correlation Coefficient
- The scores of individual batches
- Matthews evaluation for the whole dataset
- Summary
- Questions
- References
- Chapter 4: Pretraining a RoBERTa Model from Scratch
- Training a tokenizer and pretraining a transformer
- Building KantaiBERT from scratch
- Step 1: Loading the dataset
- Step 2: Installing Hugging Face transformers
- Step 3: Training a tokenizer
- Step 4: Saving the files to disk
- Step 5: Loading the trained tokenizer files
- Step 6: Checking resource constraints: GPU and CUDA
- Step 7: Defining the configuration of the model
- Step 8: Reloading the tokenizer in transformers
- Step 9: Initializing a model from scratch
- Exploring the parameters
- Step 10: Building the dataset
- Step 11: Defining a data collator
- Step 12: Initializing the trainer
- Step 13: Pretraining the model
- Step 14: Saving the final model (+tokenizer + config) to disk
- Step 15: Language modeling with FillMaskPipeline
- Next steps
- Summary
- Questions
- References
- Chapter 5: Downstream NLP Tasks with Transformers
- Transduction and the inductive inheritance of transformers
- The human intelligence stack
- The machine intelligence stack
- Transformer performances versus Human Baselines
- Evaluating models with metrics
- Accuracy score
- F1-score
- Matthews Correlation Coefficient (MCC)
- Benchmark tasks and datasets
- From GLUE to SuperGLUE
- Introducing higher Human Baselines standards
- The SuperGLUE evaluation process
- Defining the SuperGLUE benchmark tasks
- BoolQ
- Commitment Bank (CB).
- Multi-Sentence Reading Comprehension (MultiRC)
- Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
- Recognizing Textual Entailment (RTE)
- Words in Context (WiC)
- The Winograd schema challenge (WSC)
- Running downstream tasks
- The Corpus of Linguistic Acceptability (CoLA)
- Stanford Sentiment TreeBank (SST-2)
- Microsoft Research Paraphrase Corpus (MRPC)
- Winograd schemas
- Summary
- Questions
- References
- Chapter 6: Machine Translation with the Transformer
- Defining machine translation
- Human transductions and translations
- Machine transductions and translations
- Preprocessing a WMT dataset
- Preprocessing the raw data
- Finalizing the preprocessing of the datasets
- Evaluating machine translation with BLEU
- Geometric evaluations
- Applying a smoothing technique
- Chencherry smoothing
- Translation with Google Translate
- Translations with Trax
- Installing Trax
- Creating the original Transformer model
- Initializing the model using pretrained weights
- Tokenizing a sentence
- Decoding from the Transformer
- De-tokenizing and displaying the translation
- Summary
- Questions
- References
- Chapter 7: The Rise of Suprahuman Transformers with GPT-3 Engines
- Suprahuman NLP with GPT-3 transformer models
- The architecture of OpenAI GPT transformer models
- The rise of billion-parameter transformer models
- The increasing size of transformer models
- Context size and maximum path length
- From fine-tuning to zero-shot models
- Stacking decoder layers
- GPT-3 engines
- Generic text completion with GPT-2
- Step 9: Interacting with GPT-2
- Training a custom GPT-2 language model
- Step 12: Interactive context and completion examples
- Running OpenAI GPT-3 tasks
- Running NLP tasks online
- Getting started with GPT-3 engines
- Running our first NLP task with GPT-3.
- NLP tasks and examples
- Comparing the output of GPT-2 and GPT-3
- Fine-tuning GPT-3
- Preparing the data
- Step 1: Installing OpenAI
- Step 2: Entering the API key
- Step 3: Activating OpenAI's data preparation module
- Fine-tuning GPT-3
- Step 4: Creating an OS environment
- Step 5: Fine-tuning OpenAI's Ada engine
- Step 6: Interacting with the fine-tuned model
- The role of an Industry 4.0 AI specialist
- Initial conclusions
- Summary
- Questions
- References
- Chapter 8: Applying Transformers to Legal and Financial Documents for AI Text Summarization
- Designing a universal text-to-text model
- The rise of text-to-text transformer models
- A prefix instead of task-specific formats
- The T5 model
- Text summarization with T5
- Hugging Face
- Hugging Face transformer resources
- Initializing the T5-large transformer model
- Getting started with T5
- Exploring the architecture of the T5 model
- Summarizing documents with T5-large
- Creating a summarization function
- A general topic sample
- The Bill of Rights sample
- A corporate law sample
- Summarization with GPT-3
- Summary
- Questions
- References
- Chapter 9: Matching Tokenizers and Datasets
- Matching datasets and tokenizers
- Best practices
- Step 1: Preprocessing
- Step 2: Quality control
- Continuous human quality control
- Word2Vec tokenization
- Case 0: Words in the dataset and the dictionary
- Case 1: Words not in the dataset or the dictionary
- Case 2: Noisy relationships
- Case 3: Words in the text but not in the dictionary
- Case 4: Rare words
- Case 5: Replacing rare words
- Case 6: Entailment
- Standard NLP tasks with specific vocabulary
- Generating unconditional samples with GPT-2
- Generating trained conditional samples
- Controlling tokenized data
- Exploring the scope of GPT-3
- Summary
- Questions
- References.
- Chapter 10: Semantic Role Labeling with BERT-Based Transformers
- Getting started with SRL
- Defining semantic role labeling
- Visualizing SRL
- Running a pretrained BERT-based model
- The architecture of the BERT-based model
- Setting up the BERT SRL environment
- SRL experiments with the BERT-based model
- Basic samples
- Sample 1
- Sample 2
- Sample 3
- Difficult samples
- Sample 4
- Sample 5
- Sample 6
- Questioning the scope of SRL
- The limit of predicate analysis
- Redefining SRL
- Summary
- Questions
- References
- Chapter 11: Let Your Data Do the Talking: Story, Questions, and Answers
- Methodology
- Transformers and methods
- Method 0: Trial and error
- Method 1: NER first
- Using NER to find questions
- Location entity questions
- Person entity questions
- Method 2: SRL first
- Question-answering with ELECTRA
- Project management constraints
- Using SRL to find questions
- Next steps
- Exploring Haystack with a RoBERTa model
- Exploring Q&
- A with a GTP-3 engine
- Summary
- Questions
- References
- Chapter 12: Detecting Customer Emotions to Make Predictions
- Getting started: Sentiment analysis transformers
- The Stanford Sentiment Treebank (SST)
- Sentiment analysis with RoBERTa-large
- Predicting customer behavior with sentiment analysis
- Sentiment analysis with DistilBERT
- Sentiment analysis with Hugging Face's models' list
- DistilBERT for SST
- MiniLM-L12-H384-uncased
- RoBERTa-large-mnli
- BERT-base multilingual model
- Sentiment analysis with GPT-3
- Some Pragmatic I4.0 thinking before we leave
- Investigating with SRL
- Investigating with Hugging Face
- Investigating with the GPT-3 playground
- GPT-3 code
- Summary
- Questions
- References
- Chapter 13: Analyzing Fake News with Transformers
- Emotional reactions to fake news.
- Cognitive dissonance triggers emotional reactions.