The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sentence similarity computation is the research topic of domain of natural language processing, and plays an important role in the example-based machine translation, information retrieval, text mining and other fields. Those sentence similarity algorithms have different applicability in different environments. This paper reviewed and analyzed five kinds of sentence similarity algorithm, and tested...
There are lots of ranking algorithms used in Web information retrieval. However, current algorithms have some problems: these algorithms are based on different calculation formulas to calculate the documents and query similarity or train a lot of training data to get corresponding calculation formula which calculate documents and query similarity. We know that this process is a very complex, and sometimes...
Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...
Pattern matching is an important task, which is widely used in many fields, such as information retrieval and bioinformatics. Recently, a much more flexible pattern matching problem with wildcards has been proposed. Chen et al. introduced local constraints, global constraints and the one-off condition into the task of pattern matching, and the most representative algorithm SAIL was designed. However,...
This paper introduces a new description-centric algorithm for web document clustering based on Memetic Algorithms with Niching Methods, Term-Document Matrix and Bayesian Information Criterion. The algorithm defines the number of clusters automatically. The Memetic Algorithm provides a combined global and local strategy for a search in the solution space and the Niching methods to promote diversity...
The data-fusion techniques have been investigated by many researchers and have been used in implementing several information retrieval systems. Introducing a new or improved data-fusion algorithm is an active research area for the researchers' community. We propose a framework for analyses and improvement of Data-fusion algorithms; this framework is going to be: First; a supportive tool for researchers...
This paper transformed the process of Chinese question answering into agent coalition formation first, and then got the solution by using of combination of genetic algorithm and ant colony algorithm. The idea and routine of the algorithm were given. Coding scheme, selecting scheme, crossover operator, mutation operator and so on of genetic algorithm which suitable for Chinese question answering agent...
Along with the rapid growth of network information, using search engines to search information has become an integral part of one's life everyday. In recent years, there is a research focus on the search engine optimization technologies used to quickly publish business information onto the search engines by which higher rankings can be kept. The present paper analyzes the impact of receiving and recording...
Distance Education retrieval is the most striking image retrieval and video retrieval, this paper presents a distance-oriented multimedia information retrieval system of multi-modal, and in the application of SVM support vector machine relevance feedback algorithm image classification conducted a preliminary attempt.
Concept extraction work, promises to improve the performance of the term-based text mining which has high complexity. The first phase of the concept extraction is to detect the terms have notable frequency to represent the documents. With grouping these terms an important function will be implemented on the way conception. Transition from terms to concepts; by clustering the terms according to similarities...
The traditional PageRank (PR) just takes into account the Web link structure, when distributing rank scores it treats all links equally, which results in topic drift. In this paper, latent semantic model (LSM) is used to calculate the similarity between Web pages, and the LSMPageRank (LPR) algorithm is introduced. In this algorithm, the value of parent page is distributed to the child on the basis...
Mutual information algorithms have been used for the identification of gene-gene interactions in gene expression data. These methods have been hindered by a high false-positive rate. We explored the use of free-text abstracts as an additional source of information for assessing the biological relevance of predicted gene interactions. Our results suggest that the performance of a mutual information...
Document classification has received extensive attention in the past few decades due to its wide applications in many fields. To efficiently deal with this problem, a novel document classification algorithm based on information bottleneck (IB) and least square version of SVM (LS-SVM) is proposed in this paper. Extensive experimental results on the real-word document corpus show that the proposed algorithm...
This paper presents a novel approach for automatic text categorization. The mainstream of the research on rule-based classifier regards document as a container of term, and generates rules by using the term distribution in documents. General speaking, there must be existed some kind of semantic relevance between term and paragraph in a document. We call it Meaningful Inner Link Objects-MILO which...
With the rapid development of the Web2.0 communities, many researchers have been attracted by the concept of folksonomy from the field of data mining and information retrieval. Finding out semantic correlation of tags is avid requirement for Web2.0 application. However, no proper algorithm can tackle this task very well. This paper proposes a core-tag oriented clustering method to handle the task...
Pagerank algorithm evaluates the importance of web pages by the link analysis, and there are many techniques to improve the traditional pagerank algorithm to prevent from the biases of link spamming in recent years. The modified algorithms should concern not only the correctness, but also the efficiency should be considered. This paper proposes an associated pagerank algorithm for search engines to...
Internet is becoming a spreading platform for the public opinion. It is important to grasp the Internet public opinion in time and understand the trends of their opinion correctly. Text classification plays a fundamental role in a number of information management and retrieval tasks. But Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information...
In view of the poor information retrieval performance because of numerous disordered and semi-structured web information, a heuristic web information retrieval method which combines query intention classification research and topic-specific retrieval is proposed. In this method, a web retrieval model based on a scheme of one pretreatment and two retrievals is presented, and it discusses its designing...
Meaningful and useful return information is extraordinary important for information retrieval and XML keyword search. In this work, based on analysis the structure of XML document, we propose an algorithm to classify return matched nodes, we present formal analysis on LCA (lowest common ancestor) nodes ranking and LCA sub tree refining to obtain precise return information. Experimental studies show...
With the increasing of information on Internet, Web mining has been the focus of information retrieval. By a certain metric of similarity, Web clustering groups the similar Web documents. But the classical algorithms of clustering are aimless in searching the solution space and absent of semantic characters. In this paper, the probabilistic latent semantic indexing (PLSI) models which using word segmentation,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.