The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Knowledge mining emerged as a rapidly growing interdisciplinary field that merges together databases, statistics, machine learning and related areas in order to extract valuable information and knowledge in large volumes of data. In this paper we present the key finding of the results achieved during the NEMIS Conference on “Knowledge Mining”.
In this paper we look at a way of combining two or more different classification methods for text categorization. The specific methods we have been experimenting with in our group include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. Then we describe our method for combining the classifiers. A previous study suggested that the combination...
Many strategies of Text Retrieval are based on Latent Semantic Indexing and its variations, by considering different weighting systems for words and documents. Correspondence Analysis and L.S.I. share the basic algebraic tool, i.e. the Singular Value Decomposition and its generalisation, related to the use of a different way for measuring the importance of each element, both in determining and representing...
In order to delineate the state of the art of the main TM applications a two-step strategy has been pursued: first of all, some of the main European and Italian companies offering TM solutions were contacted, in order to collect information on the characteristics of the applications; secondly, a detailed search on the web was made to collect further information about users or developers and applications...
Summary The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. A critical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in...
Drawing from a recent ethnographic research on Happiness carried throughout 8 European countries in the 2003/4, Future Concept Lab will illustrate how the use of interactive digital material can be relevant to analyse qualitative and quantitative data in a participatory and creative manner. Our speech will focus on the additional value of presenting data in an interactive and flexible way by using...
There has been an increasing interest both from the Information Retrieval community and the Data Mining community in investigating possible advantages of using Word Sense Disambiguation (WSD) for enhancing semantic information in the Information Retrieval and Data Mining process. Although contradictory results have been reported, there are strong indications that the use of WSD can contribute to the...
A roadmap is typically a time-based plan that defines the present state, the state we want to reach and the way to achieve it. This includes identification of exact goals and the development of different routes for achieving them. In addition, it provides guidance to focus on the critical issues that are needed in order to meet these objectives. The roadmap of NEMIS aims at preparing the ground for...
Up to 80% of electronic data is textual and most valuable information is often encoded in pages which are neither structured, nor classified. Documents are — and will be — written in various native languages, but these documents are relevant even to non-native speakers. Nowadays everyone experiences a mounting frustration in the attempt of finding the information of interest, wading through thousands...
This paper is intended to show how an Information extraction system can be recycled to produce RDF schemas for the semantic web [1]. We demonstrate that this kind of systems must respect operational constraints like the fact that the information produced must be highly relevant (high precision, possibly bad recall). The production of explicit structured data on the web will lead a better relevance...
The paper presents a platform that facilitates the use of tools for collecting domain specific web pages as well as for extracting information from them. It also supports the configuration of such tools to new domains and languages. The platform provides a user friendly interface through which the user can specify the domain specific resources (ontology, lexica, corpora for the training and testing...
In the framework of the JuriSent case study, carried out within the European NEMIS thematic network, we analyze the contribution of text mining techniques to improve the consultation of jurisprudence textual databases. We mainly focus on correspondence analysis (CA) techniques, but also provide some insights on similar visualization techniques, such as self organizing maps (Kohonen maps), and review...
It has been demonstrated that Technology Watch (TW) and Competitive Intelligence (CI) are important tools for the development of R&D activities and the enhancement of competitiveness in enterprises. TW activities are able to detect opportunities and threats at an early stage and facilitate the information in to decide and carry out the appropriate strategies. The base of TW is the process of search,...
The paper deals with the new challenges and the roles of metada and metainformation in the area of text/data mining in the area of statistics. In the first part, the paper is presenting some basic characteristics of the contemporary statistical information systems from the point of view of the needs for utilization of metadata and data/text mining. As it is well known, modern statistical systems are...
There is a tremendous increase in the number of actors in the statistical arena in terms of producers, distributors, and users due to the new options of the web technology. These actors are not sufficiently informed about the technological progress made in the field of text mining and the ways in which they can benefit from these. The NEMIS project, and especially its Working Group 5, aims to identify...
In this contribution, we present the StatSearch prototype, a search engine that enables an enhanced access to domain specific data available on the Web. The StatSearch engine proposes a hybrid search interface combining query-based search with automated navigation through a tree-like hierarchical structure. The goal of such an interface is to allow a more natural and intuitive control over the information...
In this paper is presented the overall process and the basic conclusions of a comparison study, which was applied in the framework of NEMIS project regarding text mining tools. The basic stages of the overall comparison process are described, together with the specified evaluation criteria. Finally, the main conclusions of the particular study constitute the last chapter of the paper.
Three industrial applications of text mining will be presented requiring different methodologies. The first application used a classification approach in order filter documents relevant for personal profiles from an underlying document collection. The second application combines cluster analysis with statistical trend analysis in order detect emerging issues in manufacturing. In the third application...
This paper concerns itself with the analysis of event data with text mining tools. The methodological approaches to event data analysis are presented, and an analysis is performed using SPAD Software and SAS Text Miner. Finally, some conclusions are drawn concerning the use of text mining tools for event data analysis.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.