Robust automatic speech recognition a bridge to practical applications
Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that hav...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Amsterdam, Netherlands :
Academic Press
2016.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629586306719 |
Tabla de Contenidos:
- Front Cover
- Robust Automatic Speech Recognition: A Bridge to Practical Applications
- Copyright
- Contents
- About the Authors
- List of Figures
- List of Tables
- Acronyms
- Notations
- Chapter 1: Introduction
- 1.1 Automatic Speech Recognition
- 1.2 Robustness to Noisy Environments
- 1.3 Existing Surveys in the Area
- 1.4 Book Structure Overview
- References
- Chapter 2: Fundamentals of speech recognition
- 2.1 Introduction: Components of Speech Recognition
- 2.2 Gaussian Mixture Models
- 2.3 Hidden Markov Models and the Variants
- 2.3.1 How to Parameterize an HMM
- 2.3.2 Efficient Likelihood Evaluation for the HMM
- 2.3.3 EM Algorithm to Learn the HMM Parameters
- 2.3.4 How the HMM Represents Temporal Dynamics of Speech
- 2.3.5 GMM-HMMs for Speech Modeling and Recognition
- 2.3.6 Hidden Dynamic Models for Speech Modeling and Recognition
- 2.4 Deep Learning and Deep Neural Networks
- 2.4.1 Introduction
- 2.4.2 A Brief Historical Perspective
- 2.4.3 The Basics of Deep Neural Networks
- 2.4.4 Alternative Deep Learning Architectures
- Deep convolutional neural networks
- Deep recurrent neural networks
- 2.5 Summary
- References
- Chapter 3: Background of robust speech recognition
- 3.1 Standard Evaluation Databases
- 3.2 Modeling Distortions of Speech in Acoustic Environments
- 3.3 Impact of Acoustic Distortion on Gaussian Modeling
- 3.4 Impact of Acoustic Distortion on DNN Modeling
- 3.5 A General Framework for Robust Speech Recognition
- 3.6 Categorizing Robust ASR Techniques: An Overview
- 3.6.1 Compensation in Feature Domain vs. Model Domain
- 3.6.2 Compensation Using Prior Knowledge about Acoustic Distortion
- 3.6.3 Compensation with Explicit vs. Implicit Distortion Modeling
- 3.6.4 Compensation with Deterministic vs. Uncertainty Processing.
- 3.6.5 Compensation with Disjoint vs. Joint Model Training
- 3.7 Summary
- References
- Chapter 4: Processing in the feature and model domains
- 4.1 Feature-Space Approaches
- 4.1.1 Noise-Resistant Features
- Auditory-based features
- Temporal processing
- Neural network approaches
- 4.1.2 Feature Moment Normalization
- Cepstral mean normalization
- Cepstral mean and variance normalization
- Histogram equalization
- 4.1.3 Feature Compensation
- Spectral subtraction
- Wiener filtering
- Advanced front-end
- 4.2 Model-Space Approaches
- 4.2.1 General Model Adaptation for GMM
- 4.2.2 General Model Adaptation for DNN
- Low-footprint DNN adaptation
- Adaptation criteria
- 4.2.3 Robustness via Better Modeling
- 4.3 Summary
- References
- Chapter 5: Compensation with prior knowledge
- 5.1 Learning from Stereo Data
- 5.1.1 Empirical Cepstral Compensation
- 5.1.2 SPLICE
- 5.1.3 DNN for Noise Removal Using Stereo Data
- 5.2 Learning from Multi-Environment Data
- 5.2.1 Online Model Combination
- Online model combination for GMM
- Online model combination for DNN
- 5.2.2 Non-Negative Matrix Factorization
- 5.2.3 Variable-Parameter Modeling
- Variable-parameter modeling for GMM
- Variable-component DNN
- 5.3 Summary
- References
- Chapter 6: Explicit distortion modeling
- 6.1 Parallel Model Combination
- 6.2 Vector Taylor Series
- 6.2.1 VTS Model Adaptation
- 6.2.2 Distortion Estimation in VTS
- 6.2.3 VTS Feature Enhancement
- 6.2.4 Improvements over VTS
- 6.2.5 VTS for the DNN-Based Acoustic Model
- 6.3 Sampling-Based Methods
- 6.3.1 Data-Driven PMC
- 6.3.2 Unscented Transform
- 6.3.3 Methods Beyond the Gaussian Assumption
- 6.4 Acoustic Factorization
- 6.4.1 Acoustic Factorization Framework
- 6.4.2 Acoustic Factorization for GMM
- 6.4.3 Acoustic Factorization for DNN
- 6.5 Summary
- References.
- Chapter 7: Uncertainty processing
- 7.1 Model-Domain Uncertainty
- 7.2 Feature-Domain Uncertainty
- 7.2.1 Observation Uncertainty
- Uncertainty propagation through multilayer perceptrons
- 7.3 Joint Uncertainty Decoding
- 7.3.1 Front-End JUD
- 7.3.2 Model JUD
- 7.4 Missing-Feature Approaches
- 7.5 Summary
- References
- Chapter 8: Joint model training
- 8.1 Speaker Adaptive and Source Normalization Training
- 8.2 Model Space Noise Adaptive Training
- 8.3 Joint Training for DNN
- 8.3.1 Joint Front-End and DNN Model Training
- 8.3.2 Joint Adaptive Training
- 8.4 Summary
- References
- Chapter 9: Reverberant speech recognition
- 9.1 Introduction
- 9.2 Acoustic Impulse Response
- 9.3 A Model of Reverberated Speech in Different Domains
- 9.4 The Effect of Reverberation on ASR Performance
- 9.5 Linear Filtering Approaches
- 9.6 Magnitude or Power Spectrum Enhancement
- 9.7 Feature Domain Approaches
- 9.7.1 Reverberation Robust Features
- 9.7.2 Feature Normalization
- 9.7.3 Model-Based Feature Enhancement
- 9.7.4 Data-Driven Enhancement
- 9.8 Acoustic Model Domain Approaches
- 9.9 The REVERB Challenge
- 9.10 To Probe Further
- 9.11 Summary
- References
- Chapter 10: Multi-channel processing
- 10.1 Introduction
- 10.2 The Acoustic Beamforming Problem
- 10.3 Fundamentals of Data-Dependent Beamforming
- 10.3.1 Signal Model and Objective Functions
- 10.3.2 Generalized Sidelobe Canceller
- 10.3.3 Relative Transfer Functions
- 10.4 Multi-Channel Speech Recognition
- 10.4.1 ASR on Beamformed Signals
- 10.4.2 Multi-Stream ASR
- 10.5 To Probe Further
- 10.6 Summary
- References
- Chapter 11: Summary and future directions
- 11.1 Robust Methods in the Era of GMM
- 11.2 Robust Methods in the Era of DNN
- 11.3 Multi-Channel Input and Robustness to Reverberation
- 11.4 Epilogue
- References
- Index
- Back Cover.