Graph Based Multimedia Analysis

Graph Based Multimedia Analysis applies concepts from graph theory to the problems of analyzing overabundant video data. Video data can be quite diverse: exocentric (captured by a standard camera) or egocentric (captured by a wearable device like Google Glass); of various durations (ranging from a f...

Descripción completa

Detalles Bibliográficos
Otros Autores: Chowdhury, Ananda S., author (author), Sahu, Abhimanyu, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Cambridge, MA : Morgan Kaufmann [2025]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009842233906719
Tabla de Contenidos:
  • Front Cover
  • Graph Based Multimedia Analysis
  • Copyright
  • Contents
  • List of figures
  • List of tables
  • Biography
  • Foreword
  • Preface
  • 1 Introduction
  • 1.1 Motivation
  • 1.2 Chapter organization
  • 1.3 Basics of multimedia
  • 1.4 Preliminaries of a video
  • 1.5 Multimedia problems
  • 1.6 Graph based solutions
  • 1.7 Other solution models
  • 1.8 Organization of the book
  • References
  • 2 Theoretical foundations
  • 2.1 Motivation
  • 2.2 Organization
  • 2.3 Graph basics
  • 2.4 Delaunay graph
  • 2.5 Bipartite graph
  • 2.6 Minimum spanning tree
  • 2.7 Optimum path forest
  • 2.8 Random walks on a graph
  • 2.9 Knapsack problems
  • 2.10 Elementary game theory
  • References
  • 3 Exocentric video summarization
  • 3.1 Motivation
  • 3.2 Chapter organization
  • 3.3 Related works
  • 3.3.1 Related works for exocentric video summarization
  • 3.3.2 Related works for scalable exocentric video summarization
  • 3.4 Method I: Delaunay graph based solutions for exocentric video summarization
  • 3.4.1 Method IA: Constrained Delaunay graph clustering based summary
  • 3.4.1.1 Video frame presampling
  • 3.4.1.2 Feature extraction
  • 3.4.1.3 Elimination of redundant frames
  • 3.4.1.4 Delaunay graph based constrained clustering
  • 3.4.1.5 Key frame extraction
  • 3.4.2 Method IB: Delaunay graph based summary with user customization
  • 3.4.3 Method IC: Delaunay graph based summary in enhanced feature space
  • 3.5 Method II: A graph modularity based clustering for exocentric video summarization
  • 3.5.1 Compressed domain feature extraction
  • 3.5.2 Multi-feature fusion
  • 3.5.3 Graph modularity based clustering
  • 3.5.4 Key frame extraction
  • 3.6 Scalable exocentric video summarization with skeleton graph and random walk
  • 3.6.1 Extraction of skeleton graph
  • 3.6.2 Clustering of skeleton graph via MST
  • 3.6.3 Label propagation with random walks.
  • 3.6.4 Key frame selection and ranking
  • 3.7 Time-complexity analysis
  • 3.7.1 Complexity analysis of an exocentric video summarization algorithm
  • 3.7.2 Complexity analysis of the scalable exocentric video summarization algorithm
  • 3.8 Experimental test bed
  • 3.8.1 Dataset(s)
  • 3.8.2 Performance measures
  • 3.8.2.1 Objective measures
  • 3.8.2.2 Subjective measures
  • 3.9 Results of Delaunay graph based exocentric video summarization methods
  • 3.9.1 Results of constrained Delaunay graph clustering based summary
  • 3.9.1.1 Performance analysis with information theoretic presampling
  • 3.9.1.2 Performance analysis with deviation ratio constraint
  • 3.9.1.3 Performance comparison with state-of-the-art methods
  • 3.9.1.4 Performance comparison with K-means clustering
  • 3.9.1.5 Clustering performance analysis
  • 3.9.1.6 Tuning of the parameters
  • 3.9.1.7 Key frame visualization
  • 3.9.2 Results of Delaunay graph based summary with user customization
  • 3.9.3 Results of Delaunay graph based summary in enhanced feature space
  • 3.9.3.1 Performance analysis with semantic features
  • 3.9.3.2 Performance analysis with CCA
  • 3.10 Results of graph modularity based solution
  • 3.11 Results of scalable exocentric video summarization
  • 3.11.1 Objective evaluations
  • 3.11.2 Subjective evaluations
  • 3.11.3 Comparison of execution times
  • 3.12 Summary
  • 3.12.1 Summary of exocentric video summarization
  • 3.12.2 Summary of scalable exocentric video summarization
  • References
  • 4 Multi-view exocentric video summarization
  • 4.1 Motivation
  • 4.2 Chapter organization
  • 4.3 Related work
  • 4.4 Proposed method
  • 4.4.1 Video preprocessing
  • 4.4.1.1 Shot detection and representation
  • 4.4.1.2 Feature extraction
  • 4.4.2 Unimportant shot elimination using Gaussian entropy
  • 4.4.3 Multi-view correlation using bipartite matching.
  • 4.4.4 Shot clustering by OPF
  • 4.5 Time-complexity analysis
  • 4.6 Experimental results
  • 4.6.1 Datasets
  • 4.6.2 Performance measures
  • 4.6.3 Ablation study
  • 4.6.4 Comparison with mono-view methods
  • 4.6.5 Comparison with multi-view methods
  • 4.7 Summary
  • References
  • 5 Egocentric video summarization
  • 5.1 Motivation
  • 5.2 Chapter organization
  • 5.3 Related work
  • 5.4 Proposed methods
  • 5.4.1 Method I: Egocentric video summarization with different graph representations
  • 5.4.1.1 Graph based shot boundary detection
  • 5.4.1.2 Graph based representative frame selection
  • 5.4.1.3 Graph based center-surround model
  • 5.4.1.4 Graph based feature extraction
  • 5.4.1.5 Construction of the VSG
  • 5.4.1.6 MST based clustering with a new measure of edge inadmissibility
  • 5.4.2 Method II: Egocentric video summarization with deep features and optimal clustering
  • 5.4.2.1 Feature extraction using deep learning
  • 5.4.2.2 Set of number of clusters
  • 5.4.2.3 CSMIK K-means
  • 5.4.2.3.1 Center-surround model
  • 5.4.2.3.2 Integer knapsack formulation
  • 5.4.2.3.3 CSMIK K-means
  • 5.5 Time-complexity analysis
  • 5.5.1 Method-I: Time complexity analysis
  • 5.5.2 Method-II: Time complexity analysis
  • 5.6 Experimental results
  • 5.6.1 Datasets
  • 5.6.2 Performance measures
  • 5.6.3 Tuning of the parameters
  • 5.6.3.1 Method-I: Tuning of the parameters
  • 5.6.3.2 Method-II: Tuning of the parameters
  • 5.6.4 Method-I: Experimental results and analysis
  • 5.6.4.1 Ablation studies
  • 5.6.4.2 Cluster validation
  • 5.6.4.3 Results on SumMe dataset
  • 5.6.4.4 Results on TvSum50 dataset
  • 5.6.5 Method-II: Experimental results and analysis
  • 5.6.5.1 Ablation studies
  • 5.6.5.2 Cluster validation
  • 5.6.5.3 Results on SumMe dataset
  • 5.6.5.4 Results on TvSum50 dataset
  • 5.6.5.5 Results on ADL dataset
  • 5.6.5.6 Results on Base jumping from CoSum dataset.
  • 5.6.5.7 Comparison with human performance
  • 5.6.5.8 Test of statistical significance
  • 5.6.5.9 Execution times
  • 5.6.5.10 Keyframe visualization
  • 5.7 Summary
  • References
  • 6 Egocentric video cosummarization
  • 6.1 Motivation
  • 6.2 Chapter organization
  • 6.3 Related work
  • 6.4 Proposed methods
  • 6.4.1 Shot segmentation
  • 6.4.2 Center-surround model
  • 6.4.3 Method I: Egocentric video cosummarization with bipartite graph matching and game theory
  • 6.4.3.1 A game-theoretic model of visual similarity
  • 6.4.3.2 Shot correspondence using bipartite graph matching
  • 6.4.4 Method II: Egocentric video cosummarization with random walks on a constrained graph and transfer learning
  • 6.4.4.1 Feature extraction using transfer learning
  • 6.4.4.2 A video representation graph
  • 6.4.4.3 Must-link and cannot-link constraints
  • 6.4.4.4 Must-link constrained modified graph
  • 6.4.4.5 Shot clustering by random walk with label refinement
  • 6.5 Time-complexity analysis
  • 6.5.1 Method-I: Time-complexity analysis
  • 6.5.2 Method-II: Time-complexity analysis
  • 6.6 Experimental results
  • 6.6.1 Datasets
  • 6.6.2 Performance measures
  • 6.6.3 Tuning of the parameters
  • 6.6.4 Method-I: Experimental results and analysis
  • 6.6.4.1 Ablation study
  • 6.6.4.2 Comparisons with other approaches
  • 6.6.4.3 Test of statistical significance
  • 6.6.4.4 Execution times
  • 6.6.5 Method-II: Experimental results and analysis
  • 6.6.5.1 Implementation details
  • 6.6.5.2 Ablation studies
  • 6.6.5.3 Results on short duration videos
  • 6.6.5.4 Results on long duration videos
  • 6.6.5.5 Comparison with human performance
  • 6.6.5.6 Test of statistical significance
  • 6.6.5.7 Execution times
  • 6.7 Summary
  • References
  • 7 Action recognition in egocentric video
  • 7.1 Motivation
  • 7.2 Chapter organization
  • 7.3 Related work
  • 7.4 Proposed method.
  • 7.4.1 Method I: Action recognition in egocentric video with shallow feature and video similarity graph
  • 7.4.1.1 PHOG feature extraction
  • 7.4.1.2 Features from the center-surround model
  • 7.4.1.3 Construction of the VSG graph
  • 7.4.1.4 Random walk on VSG
  • 7.4.2 Method II: Action recognition in egocentric video with deep features and video representation graph
  • 7.4.2.1 Center-surround model
  • 7.4.2.2 Superpixel extraction
  • 7.4.2.3 Feature extraction using deep learning
  • 7.4.2.4 Video representation graph
  • 7.4.2.5 Random walk based action labeling
  • 7.4.2.6 Action summary
  • 7.5 Time-complexity analysis
  • 7.5.1 Method-I: Time-complexity analysis
  • 7.5.2 Method-II: Time-complexity analysis
  • 7.6 Experimental results
  • 7.6.1 Dataset
  • 7.6.2 Performance measures
  • 7.6.3 Tuning of the parameters
  • 7.6.3.1 Method-I: Tuning of the parameters
  • 7.6.3.2 Method-II: Tuning of the parameters
  • 7.6.4 Method-I: Experimental results and analysis
  • 7.6.4.1 Results on ADL dataset
  • 7.6.5 Method-II: Experimental results and analysis
  • 7.6.5.1 Ablation studies for action recognition
  • 7.6.5.2 External comparisons for action recognition
  • Results on the ADL dataset:
  • Results on the GTEA dataset:
  • Results on the EGTEA Gaze+ dataset:
  • Results on the EgoGesture dataset:
  • Results on the EPIC-Kitchens dataset:
  • 7.6.5.3 Action localization
  • 7.6.5.4 Comparisons for action summarization
  • 7.7 Summary
  • References
  • 8 Conclusions
  • 8.1 Concluding remarks
  • 8.2 Future research directions
  • References
  • A Source codes
  • A.1 Organization
  • A.2 Source codes - constrained Delaunay graph clustering for exocentric video summarization
  • A.3 Source codes - optimum-path forest clustering for multi-view exocentric video summarization
  • A.4 Source codes - different graph representations for egocentric video summarization.
  • A.5 Source codes - deep feature and integer knapsack for egocentric video summarization.