Robust automatic speech recognition a bridge to practical applications

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that hav...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Li, Jinyu, author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Amsterdam, Netherlands : Academic Press 2016.
Edición:	1st edition
Materias:	Automatic speech recognition. Speech processing systems.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629586306719

Tabla de Contenidos:

Front Cover
Robust Automatic Speech Recognition: A Bridge to Practical Applications
Copyright
Contents
About the Authors
List of Figures
List of Tables
Acronyms
Notations
Chapter 1: Introduction
1.1 Automatic Speech Recognition
1.2 Robustness to Noisy Environments
1.3 Existing Surveys in the Area
1.4 Book Structure Overview
References
Chapter 2: Fundamentals of speech recognition
2.1 Introduction: Components of Speech Recognition
2.2 Gaussian Mixture Models
2.3 Hidden Markov Models and the Variants
2.3.1 How to Parameterize an HMM
2.3.2 Efficient Likelihood Evaluation for the HMM
2.3.3 EM Algorithm to Learn the HMM Parameters
2.3.4 How the HMM Represents Temporal Dynamics of Speech
2.3.5 GMM-HMMs for Speech Modeling and Recognition
2.3.6 Hidden Dynamic Models for Speech Modeling and Recognition
2.4 Deep Learning and Deep Neural Networks
2.4.1 Introduction
2.4.2 A Brief Historical Perspective
2.4.3 The Basics of Deep Neural Networks
2.4.4 Alternative Deep Learning Architectures
Deep convolutional neural networks
Deep recurrent neural networks
2.5 Summary
References
Chapter 3: Background of robust speech recognition
3.1 Standard Evaluation Databases
3.2 Modeling Distortions of Speech in Acoustic Environments
3.3 Impact of Acoustic Distortion on Gaussian Modeling
3.4 Impact of Acoustic Distortion on DNN Modeling
3.5 A General Framework for Robust Speech Recognition
3.6 Categorizing Robust ASR Techniques: An Overview
3.6.1 Compensation in Feature Domain vs. Model Domain
3.6.2 Compensation Using Prior Knowledge about Acoustic Distortion
3.6.3 Compensation with Explicit vs. Implicit Distortion Modeling
3.6.4 Compensation with Deterministic vs. Uncertainty Processing.
3.6.5 Compensation with Disjoint vs. Joint Model Training
3.7 Summary
References
Chapter 4: Processing in the feature and model domains
4.1 Feature-Space Approaches
4.1.1 Noise-Resistant Features
Auditory-based features
Temporal processing
Neural network approaches
4.1.2 Feature Moment Normalization
Cepstral mean normalization
Cepstral mean and variance normalization
Histogram equalization
4.1.3 Feature Compensation
Spectral subtraction
Wiener filtering
Advanced front-end
4.2 Model-Space Approaches
4.2.1 General Model Adaptation for GMM
4.2.2 General Model Adaptation for DNN
Low-footprint DNN adaptation
Adaptation criteria
4.2.3 Robustness via Better Modeling
4.3 Summary
References
Chapter 5: Compensation with prior knowledge
5.1 Learning from Stereo Data
5.1.1 Empirical Cepstral Compensation
5.1.2 SPLICE
5.1.3 DNN for Noise Removal Using Stereo Data
5.2 Learning from Multi-Environment Data
5.2.1 Online Model Combination
Online model combination for GMM
Online model combination for DNN
5.2.2 Non-Negative Matrix Factorization
5.2.3 Variable-Parameter Modeling
Variable-parameter modeling for GMM
Variable-component DNN
5.3 Summary
References
Chapter 6: Explicit distortion modeling
6.1 Parallel Model Combination
6.2 Vector Taylor Series
6.2.1 VTS Model Adaptation
6.2.2 Distortion Estimation in VTS
6.2.3 VTS Feature Enhancement
6.2.4 Improvements over VTS
6.2.5 VTS for the DNN-Based Acoustic Model
6.3 Sampling-Based Methods
6.3.1 Data-Driven PMC
6.3.2 Unscented Transform
6.3.3 Methods Beyond the Gaussian Assumption
6.4 Acoustic Factorization
6.4.1 Acoustic Factorization Framework
6.4.2 Acoustic Factorization for GMM
6.4.3 Acoustic Factorization for DNN
6.5 Summary
References.
Chapter 7: Uncertainty processing
7.1 Model-Domain Uncertainty
7.2 Feature-Domain Uncertainty
7.2.1 Observation Uncertainty
Uncertainty propagation through multilayer perceptrons
7.3 Joint Uncertainty Decoding
7.3.1 Front-End JUD
7.3.2 Model JUD
7.4 Missing-Feature Approaches
7.5 Summary
References
Chapter 8: Joint model training
8.1 Speaker Adaptive and Source Normalization Training
8.2 Model Space Noise Adaptive Training
8.3 Joint Training for DNN
8.3.1 Joint Front-End and DNN Model Training
8.3.2 Joint Adaptive Training
8.4 Summary
References
Chapter 9: Reverberant speech recognition
9.1 Introduction
9.2 Acoustic Impulse Response
9.3 A Model of Reverberated Speech in Different Domains
9.4 The Effect of Reverberation on ASR Performance
9.5 Linear Filtering Approaches
9.6 Magnitude or Power Spectrum Enhancement
9.7 Feature Domain Approaches
9.7.1 Reverberation Robust Features
9.7.2 Feature Normalization
9.7.3 Model-Based Feature Enhancement
9.7.4 Data-Driven Enhancement
9.8 Acoustic Model Domain Approaches
9.9 The REVERB Challenge
9.10 To Probe Further
9.11 Summary
References
Chapter 10: Multi-channel processing
10.1 Introduction
10.2 The Acoustic Beamforming Problem
10.3 Fundamentals of Data-Dependent Beamforming
10.3.1 Signal Model and Objective Functions
10.3.2 Generalized Sidelobe Canceller
10.3.3 Relative Transfer Functions
10.4 Multi-Channel Speech Recognition
10.4.1 ASR on Beamformed Signals
10.4.2 Multi-Stream ASR
10.5 To Probe Further
10.6 Summary
References
Chapter 11: Summary and future directions
11.1 Robust Methods in the Era of GMM
11.2 Robust Methods in the Era of DNN
11.3 Multi-Channel Input and Robustness to Reverberation
11.4 Epilogue
References
Index
Back Cover.

Robust automatic speech recognition a bridge to practical applications

Ejemplares similares