The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a method that can reliably monitor the adoption of existing technology by term frequency-inverse document frequency (TF-IDF) and K-means clustering using cited patents. TF-IDF and K-means clustering can extract patent information when the number of patents is sufficiently large. When the number of patents is too small for TF-IDF and K-means clustering to be reliable, the method...
This paper presents a new technique for preparing word templates to improve the performance of dynamic time warping based keyword spotting. The proposed technique selects one reference template from a small set of examples and in contrast to existing model based approaches does not require extensive training
images automatically. Cluster IDs are adopted to index the characters. A Dream of Red Mansions, a famous classical Chinese literature work including near one million characters, is used to evaluate the performance of Chinese keyword spotting. Experimental results confirm the effectiveness of knowledge-based clustering and
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
This research is concerned with the table based KNN as the approach to the keyword extraction task. The keyword extraction task is viewed as an instance of word classification, and it is discovered that encoding words into tables improved the word categorization performance. In this research, words are encoded into
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature
associated with an image. In our approach, we divide images into small tiles and create visual keywords using a high-dimensional clustering algorithm. These visual keywords act the same as text keywords. One of the challenges of this approach is to identify an appropriate size for visual keywords. In this paper, we report our
clustering genes is done in two steps: First, keywords corresponding to all genes of interest from a subset of MEDLINE database were extracted automatically using TF-IDF and Z-scores. In the second step, the classic K-means algorithm was used to group genes into clusters of genes based on the keyword features.
appearance characteristics, so called visual features. This paper proposes a method to cluster the scientific documents based on visual features, so called VF-Clustering algorithm. Five kinds of visual features of documents are de-fined, including body, abstract, subtitle, keyword and title. The thought of crossover and
In document categorization method by using similarity measures based on word vectors, it is important to determine key words to characterize each document. However, conventional methods select the key words based on their frequency or/and particular importance index such as tf-idf. In this paper, we propose a method to characterize each document by using temporal clusters of technical term usages...
that are more similar are considered to be entries of a dictionary associated with the initial keyword used for the query. Moreover, the corresponding regions are parts of the visual lexicon describing the keyword. Also, an already existing lexicon may be iteratively updated by new features that may not match the existing
factorization with concept-based features is significantly lower than the error with standard keyword-based features. Qualitative evaluations also suggest that concept-based features yield more coherent, distinctive and interesting story forms compared to those produced by using standard keyword-based features.
called the Associated Keyword Space(ASKS) which is effective for noisy data and projected clustering result from a three-dimensional (3D) sphere to a two dimensional(2D) spherical surface for 2D visualization. One main issue, which affects to the performance of ASKS algorithm is creating the affinity matrix. We use semantic
livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface
keyword, ontology and information-retrieval-based methods. Problems with these approaches include a shortage of high quality ontologies and a loss of semantic information. In addition, there has been little fine-grained improvement in existing approaches to service clustering. In this paper, we present a new approach to
title, keyword and link text information to represent the website. Heterogeneous classifiers are then built based on these different features. We propose a principled ensemble classification algorithm to combine the predicted results from different phishing detection classifiers. Hierarchical clustering technique has been
The main aim of this paper is to design a scheme to identify the species from its genome sequence. Feature descriptors for a genome sequence are identified using MapReduce framework. Each feature descriptor is a three lettered keyword generated using A, T, C, G nucleotide bases. Genome sequences of related species are
collections was using keyword metadata, or simply by browsing. Nowadays, content based images retrieval (CBIR) is the way to assist the system to retrieve the related images. When the users are not satisfied with their query results, the relevance feedback (RF) retrieval is one of the solutions for this problem. The user needs
integrating both low level-visual features and high-level textual keywords. Unfortunately, manual image annotation is a tedious process and may not be possible for large image databases. To overcome this limitation, several approaches that can annotate images in a semi-supervised or unsupervised way have emerged. In this paper
Time-sync comments reveal a new way of extracting the online video tags. However, such time-sync comments have lots of noises due to users' diverse comments, introducing great challenges for accurate and fast video tag extractions. In this paper, we propose an unsupervised video tag extraction algorithm named Semantic Weight-Inverse Document Frequency (SW-IDF). SW-IDF first generates corresponding...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.