The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper is focused on the task of detecting words of interest in an audio scene (a room, a lab or a workshop) or in a continually recorded stream of speech, music and other sounds. The solution of this task is important in many applications, e.g. for command control in houses for handicapped persons, for automating
emotive-keywords. The happy speech synthesized by the proposed method, when assessed subjectively, yields a mean opinion score of 2.53 out of a possible 3. The synthetic speech is also assessed objectively using a GMM-based emotion recognition system, and all the tested sentences are recognized to be happy.
. This paper presents an initial study using n-best recognition hypotheses for two tasks, extractive summarization and keyword extraction. We extend the approach used on 1-best output to n-best hypotheses: MMR (maximum marginal relevance) for summarization and TFIDF (term frequency, inverse document frequency) weighting for
In this work, a template-based search approach is adopted for the Keyword Search (KWS) problem on two of the low-resource languages (Turkish and Swahili). In low-resource languages, the use of Large Vocabulary Continuous Speech Recognition (LVCSR) systems in KWS tasks may perform poorly especially on out-of-vocabulary
implement the pronunciation conversion of English keywords to Chinese automatically. The efficiency of the proposed method was demonstrated under KWS task on mixed language database.
The task of keyword spotting is to detect a set of keywords in the input continuous speech. The main goal of this work is to develop an improved Mandarin keyword spotting (KWS) system for conversational telephone speech (CTS). In this paper, we propose an efficient online-garbage model based KWS system, which
Audio mining is a speaker independent speech processing technique and is related to data mining. Keyword spotting plays an important role in audio mining. Keyword spotting is retrieval of all instances of a given keyword in spoken utterances. It is well suited to data mining tasks that process large amount of speech
Keyword spotting is the task of identifying the occurrences of certain desired keywords in an arbitrary speech signal. Keyword spotting has many applications one of them is telephone routing. In particular, we consider a big company which receives thousands of telephone calls daily. We are interested with the
The aim of the spoken term detection task is to find the occurrence of user-entered keywords in an archive of audio recordings. The kind of techniques that are used usually are vocabulary-independent, using only the acoustic information available. In this scenario, however, we rely exclusively on the acoustic model
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two
This study analyzes the effect of stress in human and automatic stressed speech processing tasks for speech collected from non-professional speakers. The database of 33 keywords is collected under five stress conditions, namely, neutral, angry, happy, sad and Lombard from fifteen speakers. The first study is to
Automatic speaker recognition is one of the difficult tasks in the field of computer speech and speaker recognition. Speaker recognition is a biometric process of automatically recognizing who is speaking on the basis of speaker dependent features of the speech signal. Currently, speaker recognition system is an
In many key word-spotting systems, the word posterior probability is an elementary quantity. In theory, the posterior of a keyword match denotes the probability of the match being correct. However, posteriors estimated on lattices, in particular phoneme lattices, are often off by orders of magnitude. This paper
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.