The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Researches on asynchronous communication-oriented page searching aim at solving the new problems for search engine brought about by the adoption of asynchronous communication technology. At present, a full text search engine crawler mostly adopts the algorithm based on a hyperlink analysis. The crawler searches only the contents of the HTML page and ignores the codes in the script region. But it is...
Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability, high dimension and clusters with overlaps in input space. Despite of these advantages, several features are subjected to the applications in real world such as local optimal, outliers, the c parameter must be assigned in advance and slow convergence...
In this paper we present an improved method for hierarchical clustering of Gaussian mixture components derived from Hierarchical Gaussian Mixture Expectation Maximization (HGMEM) algorithm. As HGMEM performs, it is efficient in reducing a large mixture of Gaussians into a smaller mixture while still preserving the component structure of the original mode. Compared with HGMEM algorithm, it takes covariance...
Automatic image annotation techniques are proposed for overcoming the so-called semantic-gap between image low-level feature and high-level concept in content-based image retrieval systems. Due to the limitations of techniques, current state-of-the-art automatic image annotation models still produce some irrelevant concepts to image semantics, which are an obstacle to getting high-quality image retrieval...
Topic Detection and Tracking refers to automatic techniques for locating topically related materials in streams of data. As a core of it, story link detection is to determine whether two stories are about the same topic. Up to now, many representation models have been used in story link detection. But few of them are specific to stories. This paper proposes an event model based on the characters of...
A first step required to allow video indexing and retrieval of visual data is to perform a temporal segmentation, that is, to find the location of camera-shot transitions, which can be either abrupt or gradual. We adopt SVM technique to decide whether a shot transition exists or not within a given video sequence. Active learning strategy is used to accelerate training of SVM-classifiers. We also introduce...
This paper presents a Multiple Combined Ranker (MCR) approach for answering definitional questions. Generally, our MCR approach first extracts question target-related knowledge as much as possible, then using this knowledge to pick up appropriate question answers. The knowledge includes both online definitions and related terms (RT). In our system, extraction of related terms is different from traditional...
This paper aims to solve the problems of generating natural language route description in Chinese way-finding systems, on the basis of datasets of geographical information systems and natural language generation technology. The techniques of deriving important information e.g. paths, roads, directions and landmarks from geographical information systems are discussed in detail. Through examples we...
Hierarchical clustering algorithm is efficient in reducing the bytes needed to describe the original information while preserving the original information structure. Information Bottleneck (IB) theory is a hierarchical clustering framework derivative from the information theory. Agglomerative Information Bottleneck (AIB) algorithm is a suboptimal agglomerative clustering procedure designed for optimizing...
In this paper, we describe an opinion analysis system using domain-specific lexical knowledge in Korean economic news. We tested our hypothesis that such domain-specific knowledge helps enhancing the performance of statistically based approaches and obtained a promising result.
The abstract should summarize the contents of the paper and should contain at least 70 and at most 150 words. It should be set in 9-point font size and should be inset 1.0 cm from the right and left margins. There should be two blank (10-point) lines before and after the abstract. This document is in the required format. In this paper, we present a new algorithm for reconstructing large phylogenetic...
Distributed Web crawlers have recently received more and more attention from researchers. Full decentralized crawler without a centralized managing server seems to be an interesting architectural paradigm for realizing large scale information collecting systems for its scalability, failure resilience and increased autonomy of nodes. This paper provides a novel full distributed Web crawler system which...
In the past years much research has been done on data-driven dependency parsing and performance has increased steadily. Dependency grammar has an important inherent characteristic, that is, the nodes closer to root usually make more contribution to audiences than the others. However, that is ignored in previous research in which every node in a dependency structure is considered to play the same role...
A center choice method based on sub-graph division is presented. After constructing the similarity matrix, the disconnected graphs can be established taking the text node as the vertex of the graph and then it will be analyzed. The number of the clustering center and the clustering center can be confirmed automatically on the error allowable range by this method. The noise data can be eliminated effectively...
In this paper, we propose a model for evaluating the quality of general user-created documents. The model is based on supervised classification approach, in which output scores are considered as quality of given document. In order to utilize both textual and non-textual attributes of documents, we incorporated a number of objectively measurable, real-valued features selected upon predefined criteria...
With the network information growing day by day, people engaging in commercial affairs are crying for a commerce-oriented search engine. The primary step of building up the search engine is to get commercial information efficiently from Internet. This paper introduces a method used to filter commerce-oriented information from Internet. By this method, Spider decides the passing orientation by judging...
In order to utilize news articles from multiple news sites, it is better to understand the characteristics of each news site. In this paper, a concept of contrast set mining is applied for analyzing the characteristic difference between each news site and all others. The News Site Contrast (NSContrast) system is also proposed based on this mining technique. This system is applied to a news article...
We present a music recommendation system that incorporates both collaborative filtering and mood-based recommendations. The benefits of incorporating mood-based recommendations over both content/genre-based and collaborative filtering-based recommendation are illustrated by means of a real-world user evaluation in which 54 users took part in a one month long evaluation.
Many news pages which are of high freshness requirements are published on the internet every day. They should be downloaded immediately by instant crawlers. Otherwise, they will become outdated soon. In the past, instant crawlers only downloaded pages from a manually generated news website list. Bandwidth is wasted in downloading non-news pages because news websites do not publish news pages exclusively...
We present an alignment-based approach to semi-supervised relation extraction task including more than two arguments. We concentrate on improving not only the precision of the extracted result, but also on the coverage of the method. Our relation extraction method is based on an alignment-based pattern matching approach which provides more flexibility of the method. In addition, we extract all relationships...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.