The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The goal of Information Extraction is to automatically generate structured pieces of information from the relevant information contained in text documents. Machine Learning techniques have been applied to reduce the cost of Information Extraction system adaptation. However, elements of human supervision strongly bias the learning process. Unsupervised learning approaches can avoid these biases. In...
The graphical text representation method such as Conceptual Graphs (CGs) attempts to capture the structure and semantics of documents. As such, they are the preferred text representation approach for a wide range of problems namely in natural language processing, information retrieval and text mining. In a number of these applications, it is necessary to measure the dissimilarity (or similarity) between...
This paper proposes the use of local context as a way to semantic information retrieval. In our model, rather than trying to formalize the contents of the documents among which the search is done (e.g. by formally annotating them), we try to automatically build a representation of the context in which the search is done. We consider that search is always done as part of an activity, and that the search...
Named entity relations are a foundation of semantic networks, ontology and the semantic Web, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. Relation feature selection and extraction are two key issues. The location features possess excellent computability and operability, and the semantic features have strong intelligibility...
Significant amount of text-based knowledge is created in collaborative Web-based environments in the context of education. In order to efficiently utilize all this information, there is a need to provide the users with an easy access to the information they are interested in. To achieve this, various methods of information retrieval (IR) can be used. We analyze six different algorithms for seeking...
When searching information a first and crucial step is query formulation. However it is a difficult task for users. This paper introduces a new approach to help users formulating their queries. It proposes to benefit from past search experiences to help users when formulating queries. The user can incrementally construct his query visualizing how other users have carried out searches with the terms...
Source code search is an important activity for programmers working on a change task to a software system. We are at the early stages of a research program that is aiming to answer three research questions: (1) How effectively can programmers express (using today's tools) the information they are seeking? (2) How effectively can programmers determine which of the matches returned from their searches...
The current components' retrieval tools offer few services to facilitate a relevant search of these components. They are intended for experts having already a good knowledge of the components' catalogues. However, they are still limited for inexperienced users. The crucial problem is to retrieve the components according to the user's requirements. In this context, our contribution aims at analyzing...
Exploring the metadata associated with documents in the semantic Web is a way to increase the precision of information retrieval systems. Systems have been established so far failed to overcome fully the limitations of search based on keywords. Such systems are built from variations of classic models that represent information by keywords and work upon statistical correlations. This work proposes...
In order to evaluate information retrieval algorithms it is imperative to use a dataset as a test database. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This paper presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable...
This paper presents an automatic term extraction method based on Markov process. The method aims to extract multi-word domain terms from English corpora. The paper proves that the extracting term process is a Markov chain firstly, and then gives the steps of the Markov-based method. In order to evaluate our method, we use a corpus related to computer science got by Web crawlers, and extract domain...
Most existing corpus based relation extraction techniques focus on predefined relations. In this paper, a clustering based method is presented for domain relevant relation extraction including both relation type discovery and relation instance extraction. Given two raw corpora, one in the general domain, one in an application domain, domain specific verbs connecting different instances are extracted...
There are a lot of brief semi-structure messages in the Internet. A semantic matching method based on contexts of concepts is proposed. Each word of space vectors representing brief messages is extended with synonyms in a controlled process according to the scores of information gains. Patterns of messages are extracted with naive Bayesian classifier. The similarity between a query and the message...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.