The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Evaluating the accuracy of HMM-based and SVM-based spotters in detecting keywords and recognizing the true place of keyword occurrence shows that the HMM-based spotter detects the place of occurrence more precisely than the SVM-based spotter. On the other hand, the SVM-based spotter performs much better in detecting
related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also
This paper proposes a method for keyword spotting in offline Chinese handwritten documents using a statistical model. On a text query word, the method measures the similarity between the query word and every candidate word in the document by combining a character classifier and four classifiers characterizing the
This paper presents a text query-based method for keyword spotting from online Chinese handwritten documents. The similarity between a text word and handwriting is obtained by combining the character similiarity scores given by a character classifier. To overcome the ambiguity of character segmentation, multiple
topic analysis of LDA for feature selection and compare it with the classical feature selection metrics in text categorization. For the experiments, we use SVM as the classifier and tf*idf weighting for weighting the terms. We observed that almost in all metrics, information gain performs best at all keyword numbers while
To bridge the semantic gap between low-level visual features and high-level semantic concepts, this paper puts forward a novel feedback mechanism which is based on both instance and keyword features. In offline part, keyword space model is first constructed and updated using manifold ranking annotation; in online
Keyword extraction is an automated process that collects a set of terms, illustrating an overview of the document. The term is defined how the keyword identifies the core information of a particular document. Analyzing huge number of documents to find out the relevant information, keyword extraction will be the key
models for categories specified simply by their names. We show that multiple-instance learning enables the recovery of robust category models from images returned by keyword-based search engines. By incorporating constraints that reflect the expected sparsity of true positive examples into a large-margin objective function
Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM
classification researches on Vietnamese still are limited. By using a Vietnamese news corpus, we propose some methods to solve Vietnamese news classification problems. By employing the Bag of Words (BoW) with keywords extraction and Neural Network approaches, we trained a machine learning model that could achieve an average of
. In view of the traditional feature extraction method based on binary program, this paper presents a method for feature extraction of JAVA source code. The method uses the Keywords Correlation Distance to compute the correlation between key codes such as API calls, Android permissions, the common parameters, and the
approach involves the detection and use of self-defining features that are available within the data. We take into account two emotionally rich features: a) emoticons and b) lists of emotionally intense keywords. These features are evaluated on data coming from a popular forum, using various classifiers and feature vectors
very large when a dense grid is used where the histograms are computed and combined for many different points. The current dominating solution to this problem is to use a clustering method to create a visual codebook that is exploited by an appearance based descriptor to create a histogram of visual keywords present in an
metrics used in text categorization by using local and global policies. For the experiments, we use three datasets which vary in size, complexity and skewness. We use SVM as the classifier and tf-idf weighting for term weighting. We observed that almost in all metrics, local policy outperforms when the number of keywords is
In this paper, we propose a novel multi-label image annotation for image retrieval based on annotated keywords. For multi-label image annotation, a bi-coded genetic algorithm is employed to select optimal feature subsets and corresponding optimal weights for every one vs. one SVM classifiers. After an unlabelled image
, pattern recognition) to detect such critical documents. To address difficult or ambiguous instances, we supplement the text classifier with an automated keyword search. That is, we extract, in an automated fashion, discriminative terms (i.e., keywords) from the training set and match them against documents during the
part of a trending discussion topic by the presence of a tagged keyword. Relying solely on this keyword, however, may be inadequate for identifying all the discussion associated with a trend. Our research demonstrates that machine learning techniques can be used identify the top trend a tweet belongs to with up to 85
This paper presents a novel method for deriving patterns for classification of speech sounds. In contrast to conventional methods that attempt to capture time-frequency patterns as represented by spectral envelopes or peaks, our method captures patterns of high-energy tracks, or seams, of maximum “whiteness” across frequency in spectrograms. Our hypothesis is that these seams could potentially carry...
Image classification is an important research topic due to its potential impact on both image processing and understanding. However, due to the inherent ambiguity of image-keyword mapping, this task becomes a challenge. From the perspective of machine learning, image classification task fits the multi-instance
This paper presents strategy to classify tweets sentiment using Naive Bayes techniques based on trainers' perception into three categories; positive, negative or neutral. 50 tweets of ‘Malaysia’ and ‘Maybank’ keywords were selected from Twitter for perception training. In this study, there were 27 trainers
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.