The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
For the purpose of discovering White Dwarf + Main Sequence (WDMS) from massive spectra, in this paper, an unsupervised learning algorithm for Nonlinear Dimensionality Reduction (NLDR) named Isometric Feature Mapping (Isomap) is discussed. The applicability of Isomap to Sloan Digital Sky Survey Data Release 10 (SDSS-DR10) is confirmed. Furthermore, Particle Swarm Optimization (PSO) is implemented to...
There are a large number of accessible deep Web sites on the Internet. However, even if identical entity has different representation formats on different Web sites. So entity identification plays a crucial role in deep Web data mining. This paper proposes an entity identification method in the field of Chinese books. First, using improved Jaccard coefficients to calculate similarity of text attributes...
In order to precisely procure the Chinese person information on the web, especially distinguish from the namesake, this paper propose a clustering algorithm based on latent semantic model. It establishes for every document a latent semantic model of sentence-word matrix based on central distance, central segment, document length, etc, by building the central word library of person attributes. It clusters...
How to improve the accuracy of the concept similarity computation is still an issue needed to be studied further. This paper proposes a new approach which combines concept similarity with the users' individual preference. The personal coefficients trained by artificial neural network are employed to adjust the initial concept similarities to get the actual concept similarities. The experimental results...
This paper presents a new method for the mining the hottest topics on Chinese Web page which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words is which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between...
Current approaches for generating wrappers for web page extraction suffer from the requirement of huge amount of labeled training pages to obtain satisfying results. On the other hand, the quality of data extracted by fully automatic methods is not reliable. In this paper, we propose a novel method to facilitate wrapper generation by combining wrapper induction and page analysis approaches. In addition...
A number of recent works have proposed using data mining and machine learning techniques to classify traffic flows based on statistical flow characteristics. Most of these classifiers work offline, since full-flow statistics are not available until a flow is finished. Therefore, it is usually too late to take actions for online deployment. In this paper, we propose a simple and effective technique...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.