Advanced methods and deep learning in computer vision
"Advanced Methods and Deep Learning in Computer Vision presents advanced computer vision methods, emphasizing machine and deep learning techniques that have emerged during the past 5-10 years. The book provides clear explanations of principles and algorithms supported with applications. Topics...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
London, UK :
Elsevier Science & Technology
[2021]
|
Colección: | Computer vision and pattern recognition series
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009835425906719 |
Tabla de Contenidos:
- Front Cover
- Advanced Methods and Deep Learning in Computer Vision
- Copyright
- Contents
- List of contributors
- About the editors
- Preface
- 1 The dramatically changing face of computer vision
- 1.1 Introduction - computer vision and its origins
- 1.2 Part A - Understanding low-level image processing operators
- 1.2.1 The basics of edge detection
- 1.2.2 The Canny operator
- 1.2.3 Line segment detection
- 1.2.4 Optimizing detection sensitivity
- 1.2.5 Dealing with variations in the background intensity
- 1.2.6 A theory combining the matched filter and zero-mean constructs
- 1.2.7 Mask design-other considerations
- 1.2.8 Corner detection
- 1.2.9 The Harris `interest point' operator
- 1.3 Part B - 2-D object location and recognition
- 1.3.1 The centroidal profile approach to shape analysis
- 1.3.2 Hough-based schemes for object detection
- 1.3.3 Application of the Hough transform to line detection
- 1.3.4 Using RANSAC for line detection
- 1.3.5 A graph-theoretic approach to object location
- 1.3.6 Using the generalized Hough transform (GHT) to save computation
- 1.3.7 Part-based approaches
- 1.4 Part C - 3-D object location and the importance of invariance
- 1.4.1 Introduction to 3-D vision
- 1.4.2 Pose ambiguities under perspective projection
- 1.4.3 Invariants as an aid to 3-D recognition
- 1.4.4 Cross ratios: the `ratio of ratios' concept
- 1.4.5 Invariants for noncollinear points
- 1.4.6 Vanishing point detection
- 1.4.7 More on vanishing points
- 1.4.8 Summary: the value of invariants
- 1.4.9 Image transformations for camera calibration
- 1.4.10 Camera calibration
- 1.4.11 Intrinsic and extrinsic parameters
- 1.4.12 Multiple view vision
- 1.4.13 Generalized epipolar geometry
- 1.4.14 The essential matrix
- 1.4.15 The fundamental matrix
- 1.4.16 Properties of the essential and fundamental matrices.
- 1.4.17 Estimating the fundamental matrix
- 1.4.18 Improved methods of triangulation
- 1.4.19 The achievements and limitations of multiple view vision
- 1.5 Part D - Tracking moving objects
- 1.5.1 Tracking - the basic concept
- 1.5.2 Alternatives to background subtraction
- 1.6 Part E - Texture analysis
- 1.6.1 Introduction
- 1.6.2 Basic approaches to texture analysis
- 1.6.3 Laws' texture energy approach
- 1.6.4 Ade's eigenfilter approach
- 1.6.5 Appraisal of the Laws and Ade approaches
- 1.6.6 More recent developments
- 1.7 Part F - From artificial neural networks to deep learning methods
- 1.7.1 Introduction: how ANNs metamorphosed into CNNs
- 1.7.2 Parameters for defining CNN architectures
- 1.7.3 Krizhevsky et al.'s AlexNet architecture
- 1.7.4 Simonyan and Zisserman's VGGNet architecture
- 1.7.5 Noh et al.'s DeconvNet architecture
- 1.7.6 Badrinarayanan et al.'s SegNet architecture
- 1.7.7 Application of deep learning to object tracking
- 1.7.8 Application of deep learning to texture classification
- 1.7.9 Texture analysis in the world of deep learning
- 1.8 Part G - Summary
- Acknowledgments
- References
- Biographies
- 2 Advanced methods for robust object detection
- 2.1 Introduction
- 2.2 Preliminaries
- 2.3 R-CNN
- 2.3.1 System design
- 2.3.2 Training
- 2.4 SPP-Net
- 2.5 Fast R-CNN
- 2.5.1 Architecture
- 2.5.2 RoI pooling
- 2.5.3 Multitask loss
- 2.5.4 Finetuning strategy
- 2.6 Faster R-CNN
- 2.6.1 Architecture
- 2.6.2 Region proposal networks
- 2.7 Cascade R-CNN
- 2.7.1 Architecture
- 2.7.2 Cascaded bounding box regression
- 2.7.3 Cascaded detection
- 2.8 Multiscale feature representation
- 2.8.1 MS-CNN
- 2.8.1.1 Architecture
- 2.8.2 FPN
- 2.8.2.1 Architecture
- Bottom-up pathway
- Top-down pathway and lateral connections
- 2.9 YOLO
- 2.10 SSD
- 2.10.1 Architecture
- 2.10.2 Training.
- 2.11 RetinaNet
- 2.11.1 Focal loss
- 2.12 Detection performances
- 2.13 Conclusion
- References
- Biographies
- 3 Learning with limited supervision
- 3.1 Introduction
- 3.2 Context-aware active learning
- 3.2.1 Active learning
- 3.2.2 Context in active learning
- 3.2.3 Framework for context-aware active learning
- 3.2.4 Applications
- 3.3 Weakly supervised event localization
- 3.3.1 Network architecture
- 3.3.2 k-max multiple instance learning
- 3.3.3 Coactivity similarity
- 3.3.4 Applications
- 3.4 Domain adaptation of semantic segmentation using weak labels
- 3.4.1 Weak labels for category classification
- 3.4.2 Weak labels for feature alignment
- 3.4.3 Network optimization
- 3.4.4 Acquiring weak labels
- 3.4.5 Applications
- 3.4.6 Output space visualization
- 3.5 Weakly-supervised reinforcement learning for dynamical tasks
- 3.5.1 Learning subgoal prediction
- 3.5.2 Supervised pretraining
- 3.5.3 Applications
- 3.6 Conclusions
- Acknowledgments
- References
- Biographies
- 4 Efficient methods for deep learning
- 4.1 Model compression
- 4.1.1 Parameter pruning
- 4.1.2 Low-rank factorization
- 4.1.3 Quantization
- 4.1.4 Knowledge distillation
- 4.1.5 Automated model compression
- 4.2 Efficient neural network architectures
- 4.2.1 Standard convolution layer
- 4.2.2 Efficient convolution layers
- 4.2.3 Manually designed efficient CNN models
- 4.2.4 Neural architecture search
- 4.2.5 Hardware-aware neural architecture search
- 4.3 Conclusion
- References
- 5 Deep conditional image generation
- 5.1 Introduction
- 5.2 Visual pattern learning: a brief review
- 5.3 Classical generative models
- 5.4 Deep generative models
- 5.5 Deep conditional image generation
- 5.6 Disentanglement for controllable synthesis
- 5.6.1 Disentangle visual content and style
- 5.6.2 Disentangle structure and style.
- 5.6.3 Disentangle identity and attributes
- 5.7 Conclusion and discussions
- References
- 6 Deep face recognition using full and partial face images
- 6.1 Introduction
- 6.1.1 Deep learning models
- 6.1.1.1 The structure of a CNN
- 6.1.1.2 Methods of training CNNs
- 6.1.1.3 Datasets for deep face recognition experimentation
- 6.2 Components of deep face recognition
- 6.2.1 An example of a trained CNN model for face recognition
- 6.2.1.1 Feature extraction
- 6.2.1.2 Feature classification
- 6.3 Face recognition using full face images
- 6.3.1 Similarity matching using the FaceNet model
- 6.4 Deep face recognition using partial face data
- 6.5 Specific model training for full and partial faces
- 6.5.1 Suggested architecture of the model
- 6.5.2 Training phase
- 6.6 Discussion and conclusions
- References
- Biographies
- 7 Unsupervised domain adaptation using shallow and deep representations
- 7.1 Introduction
- 7.2 Unsupervised domain adaptation using manifolds
- 7.2.1 Unsupervised domain adaptation using product manifolds
- 7.3 Unsupervised domain adaptation using dictionaries
- 7.3.1 Generalized domain adaptive dictionary learning
- 7.3.2 Joint hierarchical domain adaptation and feature learning
- 7.3.3 Incremental dictionary learning for unsupervised domain adaptation
- 7.4 Unsupervised domain adaptation using deep networks
- 7.4.1 Discriminative approaches for domain adaptation
- 7.4.2 Generative approaches for domain adaptation
- 7.5 Summary
- References
- Biographies
- 8 Domain adaptation and continual learning in semantic segmentation
- 8.1 Introduction
- 8.1.1 Problem formulation
- 8.2 Unsupervised domain adaptation
- 8.2.1 Domain adaptation problem formulation
- 8.2.2 Adaptation focus
- 8.2.2.1 Input level adaptation
- 8.2.2.2 Feature level adaptation
- 8.2.2.3 Output level adaptation.
- 8.2.3 Unsupervised domain adaptation techniques
- 8.2.3.1 Domain adversarial adaptation
- 8.2.3.2 Generative-based adaptation
- 8.2.3.3 Classifier discrepancy
- 8.2.3.4 Self-supervised learning
- Self-training
- Entropy minimization
- 8.2.3.5 Multitasking
- 8.3 Continual learning
- 8.3.1 Continual learning problem formulation
- 8.3.2 Continual learning setups in semantic segmentation
- 8.3.3 Incremental learning techniques
- 8.3.3.1 Knowledge distillation
- 8.3.3.2 Parameter freezing
- 8.3.3.3 Geometrical feature-level regularization
- 8.3.3.4 New directions
- 8.4 Conclusion
- Acknowledgment
- References
- Biographies
- 9 Visual tracking
- 9.1 Introduction
- 9.1.1 Problem definition
- 9.1.2 Challenges in tracking
- 9.1.3 Motivation of the setting
- 9.1.4 Historical development
- 9.2 Template-based methods
- 9.2.1 The basics
- 9.2.2 Performance measures
- 9.2.3 Normalized cross correlation
- 9.2.4 Phase-only matched filter
- 9.3 Online-learning-based methods
- 9.3.1 The MOSSE filter
- 9.3.2 Discriminative correlation filters
- 9.3.3 Suitable features for DCFs
- 9.3.4 Scale space tracking
- 9.3.5 Spatial and temporal weighting
- 9.4 Deep learning-based methods
- 9.4.1 Deep features in DCFs
- 9.4.2 Adaptive deep features
- 9.4.3 End-to-end learning DCFs
- 9.5 The transition from tracking to segmentation
- 9.5.1 Video object segmentation
- 9.5.2 A generative VOS method
- 9.5.3 A discriminative VOS method
- 9.6 Conclusions
- Acknowledgment
- References
- Biographies
- 10 Long-term deep object tracking
- 10.1 Introduction
- 10.1.1 Challenges in video object tracking
- 10.1.1.1 Visual challenges in tracking
- 10.1.1.2 Learning challenges in tracking
- 10.1.1.3 Engineering challenges in tracking
- 10.2 Short-term visual object tracking
- 10.2.1 Shallow trackers
- 10.2.2 Deep trackers.
- 10.2.2.1 Correlation filter-based tracking.