The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Genetic algorithm is a computational technique that helps to find the optimal solution in the process of natural selection and crossover involving the basic steps for every evolutionary algorithms. The present work accentuates on an application of a genetic algorithm named strength pareto evolutionary algorithm (SPEA) for selection of features from crime datasets. The proposed work extracts crime...
Crime against women in India has become an eminent topic of discussion in recent years and the issue has been brought to the foreground for concern due to the increasing trends in crimes performed against women. Most of the crimes get reported and a massive dataset is being generated every year. Analysing the crime reports can help the law enforcement section to take preventive measures for reducing...
Feature dimensionality has always been one of the key challenges in text mining as it increases complexity when mining documents with high dimensionality. High dimensionality introduces sparseness, noise, and boosts the computational and space complexities. Dimensionality reduction is usually addressed by implementing either feature reduction or feature selection techniques. In this work, the problem...
In this knowledge era, plethora of textual information is growing rapidly which is usually semistructured or unstructured data collected and stored in various databases. Discovery of knowledge from this available database is not simple. Thus, the automatic feature selection approach is very much necessary in the processing of this unstructured data. The Feature selection approach focuses towards processing...
The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset,...
Sentiment analysis emerged as an important computational domain to gain insights from snippets of texts, as social media recently gained popularity. Text mining has long been a fundamental data analytic for sentiment analysis. One of the popular preprocessing approaches in text mining is transforming text strings to word vectors which form a high-dimensional sparse matrix. This sparse matrix poses...
With the development of social media applications, short text mining is becoming more and more important. Due to the sparseness of short text data, both the feature correlation information (word co-occurrence) and data contiguity information (context information) are less reliable, thus most existing text mining methods which are designed to address regular text data are less efficient in short text...
We review several feature selection methods: Recursive Feature Elimination, Select K Best, and Random Forests, as elements of a processing chain for feature selection in a text mining task. The text mining task is a multi-label classification problem of label assignment; metadata that is usually applied to published scientific papers by expert curators. In the formulation of this classification task,...
The paper presents the findings of an industry-based study in the utility of text categorization. The purpose of the study is to explore new approach to evaluate service quality of customer complaint handling. The industrial research setting is a large China insurance company. The text categorization methodologies are used in this research including nature language processing and machine learning...
In this paper we report our work on multiobjective optimization (MOO) based feature selection approach for event extraction in biomedical texts. Event extraction deals with the detection and classification of expressions that represent complex biological phenomenon involving genes and proteins. We perform feature selection within the framework of a robust machine learning algorithm, namely Conditional...
As one type of the financial fraud, financial statement fraud has not only led to a huge loss for individual investors and financial institutions, but also impacted the overall stability of the whole industry. This paper used financial and textual features extracted from annually submitted 10-k filings and combined data and text mining techniques for detection of financial statement fraud. When the...
Due to the complexity of software systems, defects are inevitable. Understanding the types of defects could help developers to adopt measures in current and future software releases. In practice, developers often categorize defects into various types. One common categorization is based on fault triggers of defects. Fault trigger is a set of conditions which activate a defect (i.e., Fault) and propagate...
Predicting the severity of bugs has been found in past research to improve triaging and the bug resolution process. For this reason, many classification/prediction approaches emerged over the years to provide an automated reasoning over severity classes. In this paper, we use text mining together with bi-grams and feature selection to improve the classification of bugs in severe/non-severe classes...
It is a big challenge to develop effective methods that can discover high quality and useful features in text documents. Most existing information retrieval and text mining methods focuses on term-based approach that often suffers from the problems of term variation and noise. This paper illustrates an innovative approach that discovers relevant knowledge to precisely describe text features for retrieving...
Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could...
As the number of speech and video documents increases on the Internet and portable devices proliferate, speech summarization becomes increasingly essential. Relevant research in this domain has typically focused on broadcasts and news, however, the automatic summarization methods used in the past may not apply to other speech domains (e.g., speech in lectures). Therefore, this study explores the lecture...
This paper presents a new application of data mining techniques, particularly text mining, to analyze educational questions asked by teachers in classrooms. More specifically, it reports on the performance of four machine learning techniques and four feature selection approaches on the classification of teacher's questions into different cognitive levels identified in Bloom's taxonomy. In doing so,...
In software maintenance, severity prediction on defect reports is an emerging issue obtaining research attention due to the considerable triaging cost. In the past research work, several text mining approaches have been proposed to predict the severity using advanced learning models. Although these approaches demonstrate the effectiveness of predicting the severity, they do not discuss the problem...
MapReduce is a software framework introduced byGoogle in 2004 to support distributed computing on large datasets on clusters of computers. The term contribution(TC)algorithm is a relatively new algorithm in text mining to selectfeatures for clustering. In this paper, we design and implement a parallel term contribution(PTC) algorithm based on MapReduce model. By experiment, we come to the conclusion...
In many emergency incidents, multiple reports and information sources are often used to help intelligence and security personnel to understand the situation during a short time period. Proper categorization and analysis of this information could enhance the efficiency of handling this large amount of potentially conflicting information, thus contributing to saving lives. The study of categorization...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.