Robust automatic speech recognition a bridge to practical applications

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that hav...

Descripción completa

Detalles Bibliográficos
Otros Autores: Li, Jinyu, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Amsterdam, Netherlands : Academic Press 2016.
Edición:1st edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629586306719
Tabla de Contenidos:
  • Front Cover
  • Robust Automatic Speech Recognition: A Bridge to Practical Applications
  • Copyright
  • Contents
  • About the Authors
  • List of Figures
  • List of Tables
  • Acronyms
  • Notations
  • Chapter 1: Introduction
  • 1.1 Automatic Speech Recognition
  • 1.2 Robustness to Noisy Environments
  • 1.3 Existing Surveys in the Area
  • 1.4 Book Structure Overview
  • References
  • Chapter 2: Fundamentals of speech recognition
  • 2.1 Introduction: Components of Speech Recognition
  • 2.2 Gaussian Mixture Models
  • 2.3 Hidden Markov Models and the Variants
  • 2.3.1 How to Parameterize an HMM
  • 2.3.2 Efficient Likelihood Evaluation for the HMM
  • 2.3.3 EM Algorithm to Learn the HMM Parameters
  • 2.3.4 How the HMM Represents Temporal Dynamics of Speech
  • 2.3.5 GMM-HMMs for Speech Modeling and Recognition
  • 2.3.6 Hidden Dynamic Models for Speech Modeling and Recognition
  • 2.4 Deep Learning and Deep Neural Networks
  • 2.4.1 Introduction
  • 2.4.2 A Brief Historical Perspective
  • 2.4.3 The Basics of Deep Neural Networks
  • 2.4.4 Alternative Deep Learning Architectures
  • Deep convolutional neural networks
  • Deep recurrent neural networks
  • 2.5 Summary
  • References
  • Chapter 3: Background of robust speech recognition
  • 3.1 Standard Evaluation Databases
  • 3.2 Modeling Distortions of Speech in Acoustic Environments
  • 3.3 Impact of Acoustic Distortion on Gaussian Modeling
  • 3.4 Impact of Acoustic Distortion on DNN Modeling
  • 3.5 A General Framework for Robust Speech Recognition
  • 3.6 Categorizing Robust ASR Techniques: An Overview
  • 3.6.1 Compensation in Feature Domain vs. Model Domain
  • 3.6.2 Compensation Using Prior Knowledge about Acoustic Distortion
  • 3.6.3 Compensation with Explicit vs. Implicit Distortion Modeling
  • 3.6.4 Compensation with Deterministic vs. Uncertainty Processing.
  • 3.6.5 Compensation with Disjoint vs. Joint Model Training
  • 3.7 Summary
  • References
  • Chapter 4: Processing in the feature and model domains
  • 4.1 Feature-Space Approaches
  • 4.1.1 Noise-Resistant Features
  • Auditory-based features
  • Temporal processing
  • Neural network approaches
  • 4.1.2 Feature Moment Normalization
  • Cepstral mean normalization
  • Cepstral mean and variance normalization
  • Histogram equalization
  • 4.1.3 Feature Compensation
  • Spectral subtraction
  • Wiener filtering
  • Advanced front-end
  • 4.2 Model-Space Approaches
  • 4.2.1 General Model Adaptation for GMM
  • 4.2.2 General Model Adaptation for DNN
  • Low-footprint DNN adaptation
  • Adaptation criteria
  • 4.2.3 Robustness via Better Modeling
  • 4.3 Summary
  • References
  • Chapter 5: Compensation with prior knowledge
  • 5.1 Learning from Stereo Data
  • 5.1.1 Empirical Cepstral Compensation
  • 5.1.2 SPLICE
  • 5.1.3 DNN for Noise Removal Using Stereo Data
  • 5.2 Learning from Multi-Environment Data
  • 5.2.1 Online Model Combination
  • Online model combination for GMM
  • Online model combination for DNN
  • 5.2.2 Non-Negative Matrix Factorization
  • 5.2.3 Variable-Parameter Modeling
  • Variable-parameter modeling for GMM
  • Variable-component DNN
  • 5.3 Summary
  • References
  • Chapter 6: Explicit distortion modeling
  • 6.1 Parallel Model Combination
  • 6.2 Vector Taylor Series
  • 6.2.1 VTS Model Adaptation
  • 6.2.2 Distortion Estimation in VTS
  • 6.2.3 VTS Feature Enhancement
  • 6.2.4 Improvements over VTS
  • 6.2.5 VTS for the DNN-Based Acoustic Model
  • 6.3 Sampling-Based Methods
  • 6.3.1 Data-Driven PMC
  • 6.3.2 Unscented Transform
  • 6.3.3 Methods Beyond the Gaussian Assumption
  • 6.4 Acoustic Factorization
  • 6.4.1 Acoustic Factorization Framework
  • 6.4.2 Acoustic Factorization for GMM
  • 6.4.3 Acoustic Factorization for DNN
  • 6.5 Summary
  • References.
  • Chapter 7: Uncertainty processing
  • 7.1 Model-Domain Uncertainty
  • 7.2 Feature-Domain Uncertainty
  • 7.2.1 Observation Uncertainty
  • Uncertainty propagation through multilayer perceptrons
  • 7.3 Joint Uncertainty Decoding
  • 7.3.1 Front-End JUD
  • 7.3.2 Model JUD
  • 7.4 Missing-Feature Approaches
  • 7.5 Summary
  • References
  • Chapter 8: Joint model training
  • 8.1 Speaker Adaptive and Source Normalization Training
  • 8.2 Model Space Noise Adaptive Training
  • 8.3 Joint Training for DNN
  • 8.3.1 Joint Front-End and DNN Model Training
  • 8.3.2 Joint Adaptive Training
  • 8.4 Summary
  • References
  • Chapter 9: Reverberant speech recognition
  • 9.1 Introduction
  • 9.2 Acoustic Impulse Response
  • 9.3 A Model of Reverberated Speech in Different Domains
  • 9.4 The Effect of Reverberation on ASR Performance
  • 9.5 Linear Filtering Approaches
  • 9.6 Magnitude or Power Spectrum Enhancement
  • 9.7 Feature Domain Approaches
  • 9.7.1 Reverberation Robust Features
  • 9.7.2 Feature Normalization
  • 9.7.3 Model-Based Feature Enhancement
  • 9.7.4 Data-Driven Enhancement
  • 9.8 Acoustic Model Domain Approaches
  • 9.9 The REVERB Challenge
  • 9.10 To Probe Further
  • 9.11 Summary
  • References
  • Chapter 10: Multi-channel processing
  • 10.1 Introduction
  • 10.2 The Acoustic Beamforming Problem
  • 10.3 Fundamentals of Data-Dependent Beamforming
  • 10.3.1 Signal Model and Objective Functions
  • 10.3.2 Generalized Sidelobe Canceller
  • 10.3.3 Relative Transfer Functions
  • 10.4 Multi-Channel Speech Recognition
  • 10.4.1 ASR on Beamformed Signals
  • 10.4.2 Multi-Stream ASR
  • 10.5 To Probe Further
  • 10.6 Summary
  • References
  • Chapter 11: Summary and future directions
  • 11.1 Robust Methods in the Era of GMM
  • 11.2 Robust Methods in the Era of DNN
  • 11.3 Multi-Channel Input and Robustness to Reverberation
  • 11.4 Epilogue
  • References
  • Index
  • Back Cover.