Graph based multimodal word clustering for video event detection

Aravind Vembu; Pradeep Natarajan; Shuang Wu; Rohit Prasad; Prem Natarajan

doi:10.1109/ICASSP.2013.6638342

Graph based multimodal word clustering for video event detection

Vembu, Aravind, Natarajan, Pradeep, Wu, Shuang, Prasad, Rohit, Natarajan, Prem

Source

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 3667 - 3671

Abstract

Combining diverse low-level features from multiple modalities has consistently improved performance over a range of video processing tasks, including event detection. In our work, we study graph based clustering techniques for integrating information from multiple modalities by identifying word clusters spread across the different modalities. We present different methods to identify word clusters including word similarity graph partitioning, word-video co-clustering and Latent Semantic Indexing and the impact of different metrics to quantify the co-occurrence of words. We present experimental results on a ≈45000 video dataset used in the TRECVID MED 11 evaluations. Our experiments show that multimodal features have consistent performance gains over the use of individual features. Further, word similarity graph construction using a complete graph representation consistently improves over partite graphs and early fusion based multimodal systems. Finally, we see additional performance gains by fusing multimodal features with individual features.

Identifiers

book ISSN :	1520-6149
book e-ISBN :	978-1-4799-0356-6
DOI	10.1109/ICASSP.2013.6638342

Authors

Vembu, Aravind

Speech, Language and Multimedia Business Unit, Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA

Natarajan, Pradeep

Speech, Language and Multimedia Business Unit, Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA

Wu, Shuang

Speech, Language and Multimedia Business Unit, Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA

Prasad, Rohit

Speech, Language and Multimedia Business Unit, Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA

see all

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Graph based multimodal word clustering for video event detection $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Vembu, Aravind

Natarajan, Pradeep

Wu, Shuang

Prasad, Rohit

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Graph based multimodal word clustering for video event detection