Evaluating the accuracy of HMM-based and SVM-based spotters in detecting keywords and recognizing the true place of keyword occurrence shows that the HMM-based spotter detects the place of occurrence more precisely than the SVM-based spotter. On the other hand, the SVM-based spotter performs much better in detecting
related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also
One commonly used approach for language recognition is to convert the input speech into a sequence of tokens such as words or phones and then to use these token sequences to determine the target language. The language classification is typically performed by extracting N-gram statistics from the token sequences and then using an N-gram language model or support vector machine (SVM) to perform the...
This paper proposes a method for keyword spotting in offline Chinese handwritten documents using a statistical model. On a text query word, the method measures the similarity between the query word and every candidate word in the document by combining a character classifier and four classifiers characterizing the
This paper presents a text query-based method for keyword spotting from online Chinese handwritten documents. The similarity between a text word and handwriting is obtained by combining the character similiarity scores given by a character classifier. To overcome the ambiguity of character segmentation, multiple
topic analysis of LDA for feature selection and compare it with the classical feature selection metrics in text categorization. For the experiments, we use SVM as the classifier and tf*idf weighting for weighting the terms. We observed that almost in all metrics, information gain performs best at all keyword numbers while
This paper proposes an unsupervised two-stage approach to automatically extract keywords from spoken documents. In the first stage, for each candidate term we compute a topic coherence and term significance measure (TCS) based on probabilistic latent semantic analysis (PLSA) models. In the second stage, we take the
paper, we propose the automatic keyword extraction system and Thai website categorization system which can automatically update the dictionary and categorize website in Thai. The dictionary is a collection of vector which is created from the automatic keyword extraction system. The result in term of accuracy shows that our
In this paper, a new method of Chinese prosodic word tagging is presented. This method consists of a rule-based algorithm named ??keyword anchor?? and a statistical algorithm based on hidden Markov model (HMM). For keyword anchor algorithm, an anchor of the prosodic word is defined to help the system to find the whole
Word posterior probability has been widely used as the confidence estimation of automatic speech recognition (ASR) systems and has been proved to be quite effective in related applications such as keyword search. However, word posterior probability tends to overestimate the true probability of a hypothesis, as it is
The paper mainly discusses the speech keyword recognition system dealing with the audio streaming media. With the help of the Microsoft Windows Media Format SDK (WMFSDK), a powerful front-end interface module is designed to extract audio stream from different streaming media and convert it to the audio format
This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This
To bridge the semantic gap between low-level visual features and high-level semantic concepts, this paper puts forward a novel feedback mechanism which is based on both instance and keyword features. In offline part, keyword space model is first constructed and updated using manifold ranking annotation; in online
In this paper we present our current work on automatic speaker recognition using keyword-conditioned phone N-gram modeling. We propose the use of contextual information around keywords in modeling a speaker's pronunciation characteristics at a phonetic level. Our approach is to add time margins around keywords when
Keyword extraction is an automated process that collects a set of terms, illustrating an overview of the document. The term is defined how the keyword identifies the core information of a particular document. Analyzing huge number of documents to find out the relevant information, keyword extraction will be the key
This paper proposes a new approach for keyword spotting, which is based on large margin and kernel methods rather than on HMMs. Unlike previous approaches, the proposed method employs a discriminative learning procedure, in which the learning phase aims at achieving a high area under the ROC curve, as this quantity is
models for categories specified simply by their names. We show that multiple-instance learning enables the recovery of robust category models from images returned by keyword-based search engines. By incorporating constraints that reflect the expected sparsity of true positive examples into a large-margin objective function
The main objective of this work is to classify Hindi stories into three genres: fable, folk-tale and legend. In this paper, we are proposing a framework for story classification using keyword and Part-of-speech (POS) based features. Keyword based features like Term Frequency (TF) and Term Frequency Inverse Document
Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM
Today's world is gradually getting agitated by Human Immunodeficiency Virus — Type 1 due to its pervasive and death-dealing nature. The virus replicates by exploiting a complex interaction network of HIV-1 and human proteins and destructs human immunity power, gradually leading to AIDS. Anti-HIV drugs are designed to utilize the information on viral-host protein-protein interactions (PPIs), so that...
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.