The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper studies the principle of text categorization in which Jensen-Shannon Divergence is used to calculate text similarity, comparing its accuracy of classification and time taking to the traditional Cosine Similarity algorithm. Experimental research shows that Jensen-Shannon Divergence algorithm will reach better results when test materials remain unchanged.
Due to homonyms, abbreviations, etc., name ambiguity is widely available in Web and e-document. For example, when integrating heterogeneous literature databases, because there are different name specifications, different authors may be thought of as the same author, and vice versa. Therefore, name ambiguity makes data robust even dirty and lowers the precision of information retrieval. In this paper,...
The dominance of the Internet in our lives sees permanent changes of how marketers conduct their marketing and measure their marketing performance. Traditional measurement methods fall short for not being timely and effective. In this study, we propose the use of Web data, in a quantitative metric, to assess market impact of brands. The metric consists of three independent dimensions, covering measures...
Due to diversity of data formats, missing of certain properties, imprecise records in heterogeneous literature databases, there exist duplicate records when integrating heterogeneous databases. Duplicate records lower the efficiency of information retrieval. In this paper, we propose an approach, named length filtering and dynamic weighting (LFDW) for duplicate records cleansing. There are three steps...
The currently similarity computation methods of Chinese sentence and their shortcomings are analyzed at first. According to the characteristic of the Chinese question sentence, Chinese question general chunk and special chunk are defined, and then a similarity computation method of Chinese question based on chunk is proposed. In this method, the semantic similarity of words is computed on the basis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.