The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. We use changes of text in the slides as a means to segment the video into semantic shots. Unlike precedent approaches, our method does not depend on availability of the electronic source of the slides, but rather extracts and recognizes the text directly...
Advances in medical imaging techniques and devices has resulted in increased use of imaging in monitoring disease progression in patients. However, extracting decision-enabling information from the resulting longitudinal multi-modal image sets poses a challenge. Radiologists often have to manually identify and quantify certain regions of interest in the longitudinal image sets, which bear upon the...
We explore the problem of rapid automatic semantic tagging of video frames of unstructured (unedited) videos. We apply the sort-merge algorithm for feature selection on a large (>1000) heterogeneous feature set for videos showing lectures, to quickly locate low-level image features most predictive for concepts such as "key frame with text" or "key frame with computer source code"...
In this paper, we propose the "democratic classifier", a simple pattern-based classification algorithm that uses very short patterns for classification, and does not rely on the minimum support threshold. Borrowing ideas from democracy, our training phase allows each training instance to vote for an equal number of candidate size-2 patterns. The training instances select patterns by effectively...
We investigate several problems in the annotation of video shots by semantic labels which are implicitly embedded in a semantic hierarchy, leading to analyses and novel methods for refining video ontologies and their ground truth. First, in the large 449 LSCOM semantic concept data set, we show that within the implicit ldquouse ontologyrdquo, many concepts tags are ambiguous as to purposeful activity,...
We investigate the symmetric Kullback-Leibler (KL2) distance in speaker clustering and its unreported effects for differently-sized feature matrices. Speaker data is represented as Mel frequency cepstral coefficient (MFCC) vectors, and features are compared using the KL2 metric to form clusters of speech segments for each speaker. We make two observations with respect to clustering based on KL2: 1...
In this paper, we introduce a new method to estimate a parametric description of the dominant motion existing in a video sequence, a key task needed to face more complex video analysis problems. In order to do so, we use motion data provided by the MPEG streams. We propose a method based on imaginary straight line tracking to retrieve the projective transformations that describe the dominant motion...
Data mining algorithms use various Trie and bitmap-based representations to optimize the support (i.e., frequency) counting performance. In this paper, we compare the memory requirements and support counting performance of FP Tree, and Compressed Patricia Trie against several novel variants of vertical bit vectors. First, borrowing ideas from the VLDB domain, we compress vertical bit vectors using...
In the domain of candidly-captured student presentation videos, we examine and evaluate approaches for multimodal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker...
We introduce a novel and inexpensive approach for the temporal alignment of speech to highly imperfect transcripts from automatic speech recognition (ASR). Transcripts are generated for extended lecture and presentation videos, which in some cases feature more than 30 speakers with different accents, resulting in highly varying transcription qualities. In our approach we detect a subset of phonemes...
We present a three-step post-processing method for increasing the precision of video shot labels in the domain of television news. First, we demonstrate that news shot sequences can be characterized by rhythms of alternation (due to dialogue), repetition (due to persistent background settings), or both. Thus a temporal model is necessarily third-order Markov. Second, we demonstrate that the output...
High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of "closed interesting" itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as...
We propose that, at the highest level of video understanding, the human needs for meaning and the methodologies to extract it are both universal and generic. One must develop an ontology, then develop analyzers that learn the statistical correlates of that ontology, and finally use the analyzers to tie together common occurrences across individual videos. The first step towards adapting the ontology...
Efficient indexing and retrieval of digital videos are important needs within instructional video databases. Semantic indexing for instructional videos can be achieved by combining the analysis of the instructor's handwriting in the video with domain knowledge taken from course support materials such as the course textbook, syllabus, or slides. We propose such a semantic indexing method, by combining...
The need for personalized summaries of media content has been driven by the recent and anticipated explosive growth in the media world. In this paper, we present a methodology and a supporting user study for generating user profiles and content features that can be used to automatically create personalized summaries of broadcast television content. We determined a mapping, from users' personality...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.