The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the context of the Neologos French speech database creation project, a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded speakers. This is intended to make the resulting database both more adapted to the development of recently proposed multi-model...
This paper investigates single and cross-show diarization based on an unsupervised i-vector framework, on French TV and Radio corpora. This framework uses speaker clustering as a way to automatically select data from unlabeled corpora to train i-vector PLDA models. Performances between supervised and unsupervised models are compared. The experimental results on two distinct test corpora (one TV, one...
This paper addresses the task of assigning a title to topic segments automatically extracted from TV Broadcast News video recordings. We propose to associate a topic segment with the title of a newspaper article collected on the web at the same date. The task implies pairing newspaper articles and topic segments by maximising a given similarity measure. This approach raises several issues, such as...
In this work, we investigate how speaker-based information and lexical-based information can be fused efficiently for topic segmentation of spoken contents. While in recent work, we have proposed an early fusion scheme, so as to jointly model speaker and lexical distribution, we propose here a co-segmentation framework, between segmentations performed in the speaker space and in the lexical space...
Our goal is to automatically identify people in TV news and debates without any predefined dictionary of people. In this paper, we focus on the problem of person identification beyond face authentication in order to improve the identification results and not only where the face is detectable. We propose to use automatic scene analysis as features for people identification. We exploit two features:...
Our goal is to automatically identify faces in TV broadcast without a pre-defined dictionary of identities. Most methods are based on identity detection (from OCR and ASR) and require a propagation strategy based on visual clustering. In TV content, people appear with many variations making the clustering difficult. In this case, speaker clustering can be a reliable link for face clustering. We propose...
Term weighting is an important task in many applications, such as information retrieval, extraction of significant words or automatic summarization. It translates the capacity of a term to discriminate a document within a collection, or a part of a document within a whole document. This paper deals with term weighting strategies in the context of lexical cohesion based topic segmentation. The aim...
Our goal is to automatically identify faces in TV content without pre-defined dictionary of identities. Most of methods are based on identity detection (from OCR and ASR) and require a propagation strategy based on visual clusterings. In TV content, people appear with many variation making the clustering very difficult. In this case, identifying speakers can be a reliable link to identify faces. In...
The overlapping speech detection systems developped by Orange and LIMSI for the ETAPE evaluation campaign on French broadcast news and debates are described. Using either cepstral features or a multi-pitch analysis, a F1-measure for overlapping speech detection up to 59.2% is reported on the TV data of the ETAPE evaluation set, where 6.7% of the speech was measured as overlapping, ranging from 1.2%...
Politician speaker turn detection in TV Broadcast News shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent)...
This paper addresses the issue of error region detection and characterization in LVCSR transcriptions. It is a well-known phenomenon that errors are not independent and tend to co-occur in automatic transcriptions. We are interested in automatically detecting these so-called error regions. Additionally, in the context of information extraction in TVBN shows, being able to automatically characterize...
Our goal is to structure TV-content by person allowing a user to navigate through the sequences of the same person. To let a user browse through the content without restriction on people within it, this structuration has to be done without any pre-defined dictionary of people. To this end, most methods propose to index people independently by the audio and visual information, and associate the indexes...
Speaker role recognition in TV Broadcast News shows is addressed in this paper with a particular focus on speaker turn role labeling. A mixed approach combining speaker clustering and analysis of Automatic Speech Recognition output is proposed for assigning speaker turns a role among: anchor, reporter and other. 86% classification accuracy is obtained for automatically segmented speaker turns on a...
Our objective is to index talking faces in a TV-Context: build a description of TV-content, in terms of talking people, without any pre-defined dictionary of identities. In TV-content, because of multi-face shots and non-speaking face shots, it is difficult to determine which face is speaking. In this work, a method is proposed which clusters people independently by the audio and by the visual information...
In the context of the Neologos French speech database creation project,1The Neologos project was funded by the French Ministry of Research in the framework of the Technolangue program. 1 a general methodology was defined for the selection of representative speaker recordings. The selection aims at providing a good coverage in terms of speaker variability while limiting the number of recorded...
Speaker representation by location in a reference space is a new technique of speaker recognition and adaptation. It consists in representing a speaker relatively rather than absolutely, by comparing him to a set of well-trained speakers. The main motivation is to obtain a compact modeling of every speaker, which gives similar performances to those of the state of the art GMM-UBM. Thus, instead of...
This paper addresses the task of voicemail messages retrieval according to a target speaker defined by a given voicemail message. The core metric used for speaker modeling is GMM-based and the paper focuses on the sorting algorithms of the voicemail messages. Various algorithms are studied and compared. An algorithm that sorts messages according to their inclusion rank into a cluster built on the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.