The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we present our experience during design, development, and pilot deployments of a data-driven machine learning based application maintenance solution. We implemented a proof of concept to address a spectrum of interrelated problems encountered in application maintenance projects including duplicate incident ticket identification, assignee recommendation, theme mining, and mapping of incidents...
Multi-pattern string matching with large set of patterns is nowadays a key issue in various text retrieval applications. Filtering undesirable URLs, Finding quotes from famous holy books texts, extracting specific patterns from DNA sequences, Antivirus scanning, intrusion detection or even music retrieval are some applications of multi-pattern string matching. As the size of corpora and the number...
The emotion tendency of sentiment word is divided into two types: static emotion tendency and dynamic emotion tendency. Basic semantic lexicon is static emotion tendency, in the real context, but it is different between static emotion tendency and dynamic emotion tendency. The paper proposes a method based on degree lexicon, negative lexicon and dependence relationship of sentence elements. The experimental...
One of the most serious problems that conventional knowledge management (KM) encompasses has been pointed out tardy and ineffective acquisition of knowledge. To resolve this problem, knowledge must be autonomously acquired according to its context of use by applying the technique of keyword extraction in machine learning algorithm-based text mining. Once the topic of the given knowledge can be identified...
Query difficulty prediction aims to identify, in advance, how reliably an information retrieval system will perform when faced with a particular user request. The prediction of query difficulty level is an interesting and important issue in Information Retrieval (IR) and is still an open research. In order to appreciate importance of query difficulty prediction we present an example., Information...
In this paper, we propose a novel post processing approach for on-line handwriting recognition. Differing from the existing linguistic knowledge-based methods, we make use of domain specific knowledge to improve the performance of recognition. Our system recognizes doctor's handwriting which often poses great challenges in readability, and then enhances the quality of recognized text by analyzing...
Exploring the evolution of social contexts with time can provide unique insights into human social dynamics. Several social contexts and relationships can be mined from unstructured text articles that describe social phenomena. In contrast to structured graphs of social networks, named entity recognition is a task that attempts to classify elements in unstructured textual items into predefined categories,...
Much has been documented in the literature on sentiment analysis and document summarisation. Much of this applies to long structured text in the form of documents and blog posts. With a shift in social media towards short commentary (see Facebook status updates and twitter tweets), the difference in comment structure may affect the accuracy of sentiment analysis techniques. From our VoiceYourView...
Extracting acronyms and their expansions from plain text is an important problem in text mining. Previous research shows that the problem can be solved via machine learning approaches. That is, converting the problem of acronym extraction to binary classification. We investigate the classification problem and find that the classes are highly unbalanced (the positive instances are very rare compared...
In this paper, we present a new approach that incorporates semantic information from a document, in the form of Hierarchical Document Signature (HDS), to measure semantic similarity between sentences. Due to variability of expressions of natural language, it is very essential to exploit the semantic properties of a document to accurately identify semantically similar sentences since sentences conveying...
Open Source Software (OSS) mailing lists are used by developers to discuss software engineering tasks performed in the project. In the last years, researchers have been conducting mailing lists linguistic analyses for understanding the intricacies of OSS development. An unpublished approach for that is to use NeuroLinguistic Theory (NT). NT postulates the use of a Preferred Representational cognitive...
Providing ontologies for the automatic trend detection enhance the quality of trend predictions. However, in the case of dynamic and fuzzy expert knowledge like the knowledge used in trend detection, it is difficult to formalize knowledge unambiguously and in a static way. In this paper we report on our experiences in modeling and formalizing trend ontology for automatic knowledge-based trend detection...
This paper evaluates CONSPECT, a service that analyses states in a learner's conceptual development. It combines two technologies - Latent Semantic Analysis to analyse text and Network Analysis (NA) to provide visualisations - into a technique called Meaningful Interaction Analysis (MIA). CONSPECT was designed to help both online learners and their tutors monitor their conceptual development. This...
Pursuing on the analysis of product reviews, an unsupervised product features categorization method is proposed. Morphemes as smallest linguistic meaningful unit are induced in measuring the intra relationship among product features instead of words. Opinion words around product features are chosen to represent the inter relationship among product features instead of full context information. The...
Question answering is a useful task to help people seek the knowledge of what they want to know. Previous study mainly focuses on factoid question answering, which serves the needs to answer factual questions. Due to rapidly increasing scale of user generated contents on the Web, people are more interested in opinion questions that can reflect others' opinions. In this paper, we propose a framework...
Word sense disambiguation is an opened issue in the text mining and natural language processing for some time. Automatic acquisition of all distinct senses for polysemy words is still a big problem in the computer science. This paper discusses an approach to generate related words for an input word in some context. The context is used for the filtering of the related words for their distinct sense...
In this article, we present a novel statistical representation method for knowledge extraction from a corpus containing short texts. Then we introduce the contrast parameter which could be adjusted for targeting different conceptual levels in text mining and knowledge extraction. The method is based on second order co-occurrence vectors whose efficiency for representing meaning has been established...
Events give important information about the behavior of a system in a summarized form. In the past, events have played an important role in breaking the functional requirements of the system in the ??event partitioning approach??. Our previous work has shown that events can be a starting point in object-oriented analysis of requirements. Every event triggers a use case in the system, hence should...
This paper reports experiments on topic extraction in Chinese documents using a feature set enriched with Word Sense Disambiguation (WSD) as semantic information. The results of these experiments suggest that incorporating WSD information into Chinese topic extraction tasks may yield improvements over models which do not use WSD information.
The goal of Information Extraction is to automatically generate structured pieces of information from the relevant information contained in text documents. Machine Learning techniques have been applied to reduce the cost of Information Extraction system adaptation. However, elements of human supervision strongly bias the learning process. Unsupervised learning approaches can avoid these biases. In...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.