The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Decision Tree induction is commonly used classification algorithm. One of the important problems is how to use records with unknown values from training as well as testing data. Many approaches have been proposed to address the impact of unknown values at training on accuracy of prediction. However, very few techniques are there to address the problem in testing data. In our earlier work, we discussed...
Since a large number of users of various social networking services post what they observe around themselves, what is going on around the world can be known in real time by extracting such real-world observations. Especially, observations covering miscellaneous areas of interest are posted to Twitter as short text messages. Our goal is to extract such observations to better understand the current...
Data pre-processing for machine learning methods is key step for knowledge discovery process. Depending on nature of the data, pre-processing might take the majority time of data analysis. Correctly prepared data for processing guarantees precise and reliable results of data analysis. This paper analyses initial data pre-processing influence to attack detection accuracy by using Decision Trees, Naïve...
This study explores the application of artificial intelligence on the causal relationship between mining production index and electricity load. The data used is the total mining production index and total electricity consumption in the mining sector sampled on a monthly basis from January 1985 to December 2011 in South Africa. Optimally-pruned and basic extreme learning machines were used to develop...
Traditional data stream classification techniques assume that the stream of data is generated from a single non-stationary process. On the contrary, a recently introduced problem setting, referred to as Multistream Classification involves two independent non-stationary data generating processes. One of them is the source stream that continuously generates labeled data instances. The other one is the...
Decision Tree is one of the most popular supervised Machine Learning algorithms; it is also the easiest to understand. But finding an optimal decision tree for a given data is a harder task and the use of multiple performance metrics adds some complexity to the problem of selecting the most appropriate DT.
Mapping API elements has a significant role in software development, especially in code migration. A manual process of defining the migration is tedious and error-prone while recent approaches to automatically mine API mappings are limited to discover the mappings with textually similar APIs’ names. This leads to the low accuracy in existing migration tools.We propose an approach to automatically...
The Support Vector Machine(SVM) is well known in machine learning and artificial intelligence for its high performance in data classification, regression and forecasting. Usually for large scaled dataset, an incremental training algorithm is applied for tuning or balancing the training cost and the accuracy in SVM applications. This paper presents an improved incremental training approach for large...
Multivariate time series (MTS) exist in many applications. Due to all kinds of interference factors, missing data in MTS is inevitable. Aiming at this problem, a filling method based on least squares support vector machine (LSSVM) is proposed. Firstly, for the series containing missing data, similar series are searched, and its results are viewed as the training set. Secondly, to make use of the correlation...
Opinion mining of authors opinions on scientific papers in citations is an important feature of scientific publications. Opinion mining aims to determine the defiance of a topic with respect to the overall polarity of a document. The main engine that drives opinion mining is the processing of subjective information. A dataset in the form of sentence-based collection of over 785 citations were collected...
With the increasing number of network comments, mining product reviews is an emerging area of research which fundamental work is focused on feature extraction. Previous studies mainly focus on explicit features extraction while often ignore implicit features which haven't been stated clearly but containing necessary information for analyzing comments. Actually in our study, we find a lot of implicit...
The process of mining includes various methodologies and data classification is one of the advantageous methods involved in it. It not only eases the process of machine learning but also gives a platform for proper functioning of the process. There are cases wherein the data which is important or unidentified is missed during the process of classification. The process of mining is highly affected...
Online shopping is one of the most comfortable ways to shop in this new era of technology. People buy online products frequently and post their reviews about the products they have used. The viewpoint of the user will be in the form of tweets or product reviews which they post in an e-commerce site. These reviews will have significant role in deciding how far the products have been placed in peoples...
Random sampling could enhance classification performance by selecting many representative samples to be included in the training dataset. The representative samples usually include the samples located at the border of each class or cluster. In this paper, a new sampling algorithm has been proposed which enforces the training sample to include the border points between classes. Considering a point...
Training a bottleneck feature (BNF) extractor with multilingual data has been common in low resource keyword search. In a low resource application, the amount of transcribed target language data is limited while there are usually plenty of multilingual data. In this paper, we investigated two methods to train efficient multilingual BNF extractors for low resource keyword search. One method is to use...
Hyperspectral images(HSIs) provide hundreds of narrow spectral bands for the land-covers, thus can provide more powerful discriminative information for the land-cover classification. However, HSIs suffer from the curse of high dimensionality, therefore dimension reduction and feature extraction are essential for the application of HSIs. In this paper, we propose an unsupervised feature extraction...
In this paper we focus on the characterization of singing styles in world music. We develop a set of contour features capturing pitch structure and melodic embellishments. Using these features we train a binary classifier to distinguish vocal from non-vocal contours and learn a dictionary of singing style elements. Each contour is mapped to the dictionary elements and each recording is summarized...
We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...
Occlusion handling is one of the most challenging issues for pedestrian detection, and no satisfactory achievement has been found in this issue yet. Using human body parts has been considered as a reasonable way to overcome such an issue. In this paper, we propose a brand new approach based on the fusion of Mid-level body part mining and Convolutional Neural Network (CNN) to solve this problem, named...
Studying marine plankton is critical to assessing the health of the world's oceans. To sample these important populations, oceanographers are increasingly using specially engineered in situ digital imaging systems that produce very large data sets. Most automated annotation efforts have considered data from individual systems in isolation. This is predicated on the assumption that the images from...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.