Graph Based Multimedia Analysis
Graph Based Multimedia Analysis applies concepts from graph theory to the problems of analyzing overabundant video data. Video data can be quite diverse: exocentric (captured by a standard camera) or egocentric (captured by a wearable device like Google Glass); of various durations (ranging from a f...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Cambridge, MA :
Morgan Kaufmann
[2025]
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009842233906719 |
Tabla de Contenidos:
- Front Cover
- Graph Based Multimedia Analysis
- Copyright
- Contents
- List of figures
- List of tables
- Biography
- Foreword
- Preface
- 1 Introduction
- 1.1 Motivation
- 1.2 Chapter organization
- 1.3 Basics of multimedia
- 1.4 Preliminaries of a video
- 1.5 Multimedia problems
- 1.6 Graph based solutions
- 1.7 Other solution models
- 1.8 Organization of the book
- References
- 2 Theoretical foundations
- 2.1 Motivation
- 2.2 Organization
- 2.3 Graph basics
- 2.4 Delaunay graph
- 2.5 Bipartite graph
- 2.6 Minimum spanning tree
- 2.7 Optimum path forest
- 2.8 Random walks on a graph
- 2.9 Knapsack problems
- 2.10 Elementary game theory
- References
- 3 Exocentric video summarization
- 3.1 Motivation
- 3.2 Chapter organization
- 3.3 Related works
- 3.3.1 Related works for exocentric video summarization
- 3.3.2 Related works for scalable exocentric video summarization
- 3.4 Method I: Delaunay graph based solutions for exocentric video summarization
- 3.4.1 Method IA: Constrained Delaunay graph clustering based summary
- 3.4.1.1 Video frame presampling
- 3.4.1.2 Feature extraction
- 3.4.1.3 Elimination of redundant frames
- 3.4.1.4 Delaunay graph based constrained clustering
- 3.4.1.5 Key frame extraction
- 3.4.2 Method IB: Delaunay graph based summary with user customization
- 3.4.3 Method IC: Delaunay graph based summary in enhanced feature space
- 3.5 Method II: A graph modularity based clustering for exocentric video summarization
- 3.5.1 Compressed domain feature extraction
- 3.5.2 Multi-feature fusion
- 3.5.3 Graph modularity based clustering
- 3.5.4 Key frame extraction
- 3.6 Scalable exocentric video summarization with skeleton graph and random walk
- 3.6.1 Extraction of skeleton graph
- 3.6.2 Clustering of skeleton graph via MST
- 3.6.3 Label propagation with random walks.
- 3.6.4 Key frame selection and ranking
- 3.7 Time-complexity analysis
- 3.7.1 Complexity analysis of an exocentric video summarization algorithm
- 3.7.2 Complexity analysis of the scalable exocentric video summarization algorithm
- 3.8 Experimental test bed
- 3.8.1 Dataset(s)
- 3.8.2 Performance measures
- 3.8.2.1 Objective measures
- 3.8.2.2 Subjective measures
- 3.9 Results of Delaunay graph based exocentric video summarization methods
- 3.9.1 Results of constrained Delaunay graph clustering based summary
- 3.9.1.1 Performance analysis with information theoretic presampling
- 3.9.1.2 Performance analysis with deviation ratio constraint
- 3.9.1.3 Performance comparison with state-of-the-art methods
- 3.9.1.4 Performance comparison with K-means clustering
- 3.9.1.5 Clustering performance analysis
- 3.9.1.6 Tuning of the parameters
- 3.9.1.7 Key frame visualization
- 3.9.2 Results of Delaunay graph based summary with user customization
- 3.9.3 Results of Delaunay graph based summary in enhanced feature space
- 3.9.3.1 Performance analysis with semantic features
- 3.9.3.2 Performance analysis with CCA
- 3.10 Results of graph modularity based solution
- 3.11 Results of scalable exocentric video summarization
- 3.11.1 Objective evaluations
- 3.11.2 Subjective evaluations
- 3.11.3 Comparison of execution times
- 3.12 Summary
- 3.12.1 Summary of exocentric video summarization
- 3.12.2 Summary of scalable exocentric video summarization
- References
- 4 Multi-view exocentric video summarization
- 4.1 Motivation
- 4.2 Chapter organization
- 4.3 Related work
- 4.4 Proposed method
- 4.4.1 Video preprocessing
- 4.4.1.1 Shot detection and representation
- 4.4.1.2 Feature extraction
- 4.4.2 Unimportant shot elimination using Gaussian entropy
- 4.4.3 Multi-view correlation using bipartite matching.
- 4.4.4 Shot clustering by OPF
- 4.5 Time-complexity analysis
- 4.6 Experimental results
- 4.6.1 Datasets
- 4.6.2 Performance measures
- 4.6.3 Ablation study
- 4.6.4 Comparison with mono-view methods
- 4.6.5 Comparison with multi-view methods
- 4.7 Summary
- References
- 5 Egocentric video summarization
- 5.1 Motivation
- 5.2 Chapter organization
- 5.3 Related work
- 5.4 Proposed methods
- 5.4.1 Method I: Egocentric video summarization with different graph representations
- 5.4.1.1 Graph based shot boundary detection
- 5.4.1.2 Graph based representative frame selection
- 5.4.1.3 Graph based center-surround model
- 5.4.1.4 Graph based feature extraction
- 5.4.1.5 Construction of the VSG
- 5.4.1.6 MST based clustering with a new measure of edge inadmissibility
- 5.4.2 Method II: Egocentric video summarization with deep features and optimal clustering
- 5.4.2.1 Feature extraction using deep learning
- 5.4.2.2 Set of number of clusters
- 5.4.2.3 CSMIK K-means
- 5.4.2.3.1 Center-surround model
- 5.4.2.3.2 Integer knapsack formulation
- 5.4.2.3.3 CSMIK K-means
- 5.5 Time-complexity analysis
- 5.5.1 Method-I: Time complexity analysis
- 5.5.2 Method-II: Time complexity analysis
- 5.6 Experimental results
- 5.6.1 Datasets
- 5.6.2 Performance measures
- 5.6.3 Tuning of the parameters
- 5.6.3.1 Method-I: Tuning of the parameters
- 5.6.3.2 Method-II: Tuning of the parameters
- 5.6.4 Method-I: Experimental results and analysis
- 5.6.4.1 Ablation studies
- 5.6.4.2 Cluster validation
- 5.6.4.3 Results on SumMe dataset
- 5.6.4.4 Results on TvSum50 dataset
- 5.6.5 Method-II: Experimental results and analysis
- 5.6.5.1 Ablation studies
- 5.6.5.2 Cluster validation
- 5.6.5.3 Results on SumMe dataset
- 5.6.5.4 Results on TvSum50 dataset
- 5.6.5.5 Results on ADL dataset
- 5.6.5.6 Results on Base jumping from CoSum dataset.
- 5.6.5.7 Comparison with human performance
- 5.6.5.8 Test of statistical significance
- 5.6.5.9 Execution times
- 5.6.5.10 Keyframe visualization
- 5.7 Summary
- References
- 6 Egocentric video cosummarization
- 6.1 Motivation
- 6.2 Chapter organization
- 6.3 Related work
- 6.4 Proposed methods
- 6.4.1 Shot segmentation
- 6.4.2 Center-surround model
- 6.4.3 Method I: Egocentric video cosummarization with bipartite graph matching and game theory
- 6.4.3.1 A game-theoretic model of visual similarity
- 6.4.3.2 Shot correspondence using bipartite graph matching
- 6.4.4 Method II: Egocentric video cosummarization with random walks on a constrained graph and transfer learning
- 6.4.4.1 Feature extraction using transfer learning
- 6.4.4.2 A video representation graph
- 6.4.4.3 Must-link and cannot-link constraints
- 6.4.4.4 Must-link constrained modified graph
- 6.4.4.5 Shot clustering by random walk with label refinement
- 6.5 Time-complexity analysis
- 6.5.1 Method-I: Time-complexity analysis
- 6.5.2 Method-II: Time-complexity analysis
- 6.6 Experimental results
- 6.6.1 Datasets
- 6.6.2 Performance measures
- 6.6.3 Tuning of the parameters
- 6.6.4 Method-I: Experimental results and analysis
- 6.6.4.1 Ablation study
- 6.6.4.2 Comparisons with other approaches
- 6.6.4.3 Test of statistical significance
- 6.6.4.4 Execution times
- 6.6.5 Method-II: Experimental results and analysis
- 6.6.5.1 Implementation details
- 6.6.5.2 Ablation studies
- 6.6.5.3 Results on short duration videos
- 6.6.5.4 Results on long duration videos
- 6.6.5.5 Comparison with human performance
- 6.6.5.6 Test of statistical significance
- 6.6.5.7 Execution times
- 6.7 Summary
- References
- 7 Action recognition in egocentric video
- 7.1 Motivation
- 7.2 Chapter organization
- 7.3 Related work
- 7.4 Proposed method.
- 7.4.1 Method I: Action recognition in egocentric video with shallow feature and video similarity graph
- 7.4.1.1 PHOG feature extraction
- 7.4.1.2 Features from the center-surround model
- 7.4.1.3 Construction of the VSG graph
- 7.4.1.4 Random walk on VSG
- 7.4.2 Method II: Action recognition in egocentric video with deep features and video representation graph
- 7.4.2.1 Center-surround model
- 7.4.2.2 Superpixel extraction
- 7.4.2.3 Feature extraction using deep learning
- 7.4.2.4 Video representation graph
- 7.4.2.5 Random walk based action labeling
- 7.4.2.6 Action summary
- 7.5 Time-complexity analysis
- 7.5.1 Method-I: Time-complexity analysis
- 7.5.2 Method-II: Time-complexity analysis
- 7.6 Experimental results
- 7.6.1 Dataset
- 7.6.2 Performance measures
- 7.6.3 Tuning of the parameters
- 7.6.3.1 Method-I: Tuning of the parameters
- 7.6.3.2 Method-II: Tuning of the parameters
- 7.6.4 Method-I: Experimental results and analysis
- 7.6.4.1 Results on ADL dataset
- 7.6.5 Method-II: Experimental results and analysis
- 7.6.5.1 Ablation studies for action recognition
- 7.6.5.2 External comparisons for action recognition
- Results on the ADL dataset:
- Results on the GTEA dataset:
- Results on the EGTEA Gaze+ dataset:
- Results on the EgoGesture dataset:
- Results on the EPIC-Kitchens dataset:
- 7.6.5.3 Action localization
- 7.6.5.4 Comparisons for action summarization
- 7.7 Summary
- References
- 8 Conclusions
- 8.1 Concluding remarks
- 8.2 Future research directions
- References
- A Source codes
- A.1 Organization
- A.2 Source codes - constrained Delaunay graph clustering for exocentric video summarization
- A.3 Source codes - optimum-path forest clustering for multi-view exocentric video summarization
- A.4 Source codes - different graph representations for egocentric video summarization.
- A.5 Source codes - deep feature and integer knapsack for egocentric video summarization.