Intelligent speech signal processing

Intelligent Speech Signal Processing investigates the utilization of speech analytics across several systems and real-world activities, including sharing data analytics, creating collaboration networks between several participants, and implementing video-conferencing in different application areas....

Descripción completa

Detalles Bibliográficos
Otros Autores: Dey, Nilanjan, author (author), Dey, Nilanjan, editor (editor)
Formato: Libro electrónico
Idioma:Inglés
Publicado: London, England : Academic Press [2019]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630433606719
Tabla de Contenidos:
  • Front Cover
  • Intelligent Speech Signal Processing
  • Copyright
  • Contents
  • Contributors
  • About the Editor
  • Preface
  • Chapter 1: Speech Processing in Healthcare: Can We Integrate?
  • References
  • Chapter 2: End-to-End Acoustic Modeling Using Convolutional Neural Networks
  • 2.1. Introduction
  • 2.2. Related Work
  • 2.3. Various Architecture of ASR
  • 2.3.1. GMM/DNN
  • 2.3.2. Attention Mechanism
  • 2.3.3. Connectionist Temporal Classification
  • 2.4. Convolutional Neural Networks
  • 2.4.1. Type of Pooling
  • 2.4.1.1. Max Pooling
  • 2.4.1.2. Average Pooling
  • 2.4.1.3. Stochastic Pooling
  • 2.4.1.4. Lp Pooling
  • 2.4.1.5. Mixed Pooling
  • 2.4.1.6. Multiscale Orderless Pooling
  • 2.4.1.7. Spectral Pooling
  • 2.4.2. Types of Nonlinear Functions
  • 2.4.2.1. Sigmoid Neurons
  • 2.4.2.2. Maxout Neurons
  • 2.4.2.3. Rectified Linear Units
  • 2.4.2.4. Parameterized Rectified Linear Units
  • 2.4.2.5. Dropout
  • 2.5. CNN-Based End-to-End Approach
  • 2.6. Experiments and Their Results
  • 2.7. Conclusion
  • References
  • Chapter 3: A Real-Time DSP-Based System for Voice Activity Detection and Background Noise Reduction
  • 3.1. Introduction
  • 3.2. Microchip dsPIC33 Digital Signal Controller
  • 3.2.1. VAD and Noise Suppression Algorithm
  • 3.3. High Pass Filter
  • 3.4. Fast Fourier Transform
  • 3.5. Channel Energy Computation
  • 3.6. Channel SNR Computation
  • 3.7. VAD Decision
  • 3.8. VAD Hangover
  • 3.9. Computation of Scaling Factor
  • 3.10. Scaling of Frequency Channels
  • 3.11. Inverse Fourier Transform
  • 3.12. Application Programming Interface
  • 3.13. Resource Requirements
  • 3.14. Microchip PIC Programmer
  • 3.15. Audio Components
  • 3.16. VAD and Background Noise Reduction Techniques
  • 3.17. Results and Discussion
  • 3.18. Conclusion and Discussion
  • References
  • Further Reading.
  • Chapter 4: Disambiguating Conflicting Classification Results in AVSR
  • 4.1. Introduction
  • 4.2. Detection of Conflicting Classes
  • 4.3. Complementary Models for Classification
  • 4.4. Proposed Cascade of Classifiers
  • 4.5. Audio-Visual Databases
  • 4.5.1. AV-CMU Database
  • 4.5.2. AV-UNR Database
  • 4.5.3. AVLetters Database
  • 4.6. Experimental Results
  • 4.6.1. Hidden Markov Models
  • 4.6.2. Random Forest
  • 4.6.3. Support Vector Machine
  • 4.6.4. AdaBoost
  • 4.6.5. Analysis and Comparison
  • 4.7. Conclusions
  • References
  • Chapter 5: A Deep Dive Into Deep Learning Techniques for Solving Spoken Language Identification Problems
  • 5.1. Introduction
  • 5.2. Spoken Language Identification
  • 5.3. Cues for Spoken Language Identification
  • 5.4. Stages in Spoken Language Identification
  • 5.5. Deep Learning
  • 5.6. Artificial and Deep Neural Network
  • 5.7. Comparison of Spoken LID System Implementations with Deep Learning Techniques
  • 5.8. Discussion
  • 5.9. Conclusion
  • References
  • Chapter 6: Voice Activity Detection-Based Home Automation System for People With Special Needs
  • 6.1. Introduction
  • 6.2. Conceptual Design of the System
  • 6.3. System Implementation
  • 6.3.1. Speech Recognition
  • 6.3.2. System Automation
  • 6.3.3. Other Applications
  • 6.3.4. Results and Discussion
  • 6.4. Significance/Contribution
  • 6.5. Conclusion
  • References
  • Further Reading
  • Chapter 7: Speech Summarization for Tamil Language
  • 7.1. Introduction
  • 7.2. Extractive Summarization
  • 7.2.1. Supervised Summarization Methods
  • 7.2.2. Unsupervised Summarization Methods
  • 7.3. Abstractive Summarization
  • 7.3.1. Structured Approach
  • 7.3.1.1. Tree-Based Technique
  • 7.3.1.2. Template-Based Technique
  • 7.3.1.3. Ontology-Based Technique
  • 7.3.1.4. Lead and Body Phrase Technique
  • 7.3.1.5. Rule-Based Technique
  • 7.3.2. Semantic-Based Approach.
  • 7.3.2.1. Multimodal Semantic Technique
  • 7.3.2.2. Information Item-Based Technique
  • 7.3.2.3. Semantic Graph-Based Technique
  • 7.4. Need for Speech Summarization
  • 7.5. Issues in the Summarization of a Spoken Document
  • 7.6. Tamil Language
  • 7.6.1. Tamil Unicode
  • 7.7. System Design for Summarization of Speech Data in Tamil Language
  • 7.7.1. Speech Recognition Techniques
  • 7.7.2. Isolated Tamil Speech Recognition
  • 7.7.2.1. Related Work on Tamil Speech Recognition
  • 7.7.3. Features Used for Summarization
  • 7.7.3.1. Acoustic/Prosodic Feature
  • 7.7.3.2. Lexical Feature
  • 7.7.3.3. Part-of-Speech Tagging
  • 7.7.3.4. Stop Word Removal
  • 7.7.3.5. Structural Feature
  • 7.7.3.6. Discourse Feature
  • 7.7.3.7. Relevance Feature
  • 7.7.4. Related Work on Tamil Text Summarization
  • 7.8. Evaluation Metrics
  • 7.8.1. Rouge
  • 7.8.1.1. ROUGE-n
  • 7.8.1.2. ROUGE-L
  • 7.8.1.3. ROUGE-SU
  • 7.8.2. Precision, Recall, and F-Measure
  • 7.8.3. Word Error Rate
  • 7.8.4. Word Recognition Rate
  • 7.9. Speech Corpora for Tamil Language
  • 7.10. Conclusion
  • References
  • Further Reading
  • Chapter 8: Classifying Recurrent Dynamics on Emotional Speech Signals
  • 8.1. Introduction
  • 8.2. Data Collection and Processing
  • 8.3. Research Methodology
  • 8.3.1. Phase Space Reconstruction
  • 8.3.2. Recurrence Plot Analysis
  • 8.4. Numerical Experiments and Results
  • 8.4.1. Noise-Free Environment
  • 8.4.2. Noisy Environment
  • 8.5. Conclusion
  • References
  • Chapter 9: Intelligent Speech Processing in the Time-Frequency Domain
  • 9.1. Wavelet Packet Decomposition
  • 9.1.1. Spectral Analysis
  • 9.1.2. Pitch Detection
  • 9.2. Empirical Mode Decomposition
  • 9.2.1. EMD of Synthetic and Speech Signal
  • 9.2.2. Spectrum of IMFs
  • 9.2.3. Estimation of Pitch
  • 9.3. Variational Mode Decomposition
  • 9.3.1. Voiced and Unvoiced Detection of Speech.
  • 9.3.2. Estimation of Pitch Period
  • 9.4. Synchrosqueezing Wavelet Transform: EMD Like a Tool
  • 9.4.1. Reconstructions of Speech Signal Using Synchrosqueezed Transform
  • 9.4.2. Advantage of Synchrosqueezed Wavelet Transforms
  • 9.5. Applications of the Decomposition Technique
  • 9.5.1. Feature Extraction
  • 9.5.2. Clinical Diagnosis and Pathological Speech Processing
  • 9.5.3. Automatic Speech Recognition
  • 9.5.4. Other Application Area
  • 9.6. Conclusion
  • References
  • Further Reading
  • Chapter 10: A Framework for Artificially Intelligent Customized Voice Response System Design using Speech Synthesis Marku ...
  • 10.1. Introduction
  • 10.2. Literature Survey
  • 10.3. AWS IoT
  • 10.4. Amazon Voice Service (AVS)
  • 10.5. AWS Lambda
  • 10.6. Message Queuing Telemetry Transport (MQTT)
  • 10.7. Proposed Architecture
  • 10.8. Conclusion
  • References
  • Index
  • Back Cover.