The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In practice, there are a variety of real-world datasets that have an imbalanced nature where one of two classes dominates the data. These datasets are generally difficult to classify using machine learning algorithms as the skewed nature of the data has a significant impact on the training process. In order to combat this difficulty, many methods of under sampling and over sampling have been proposed...
Data mining algorithms are used to analyze and discover useful information from data. This paper presents an experiment that applies Combinatorial Testing (CT) to five data mining algorithms implemented in an open-source data mining software called WEKA. For each algorithm, we first run the algorithm with 51 datasets to study the impact different datasets have on the test coverage. We select one dataset...
Difficulty Level of a question is relative to that of other questions in a test and also to the test takers, hence manually assigning Difficulty Level tags may not be accurate. There is a need to infer them from historical data pertaining to the performance of students in a test. e-Yantra Robotics Competition (eYRC) is an annual competition having around 5000 teams (20,000 students) registering in...
Online shopping is a common shopping style for human being nowadays. Rating mechanisms usually exist in most of the shopping sites. Therefore, predicting which products a customer is going to buy next from the rating information becomes possible, making recommender systems important for online shopping. The success of an online shopping site can be dominated by the quality of the recommender system...
2D-to-3D conversion is an important task for reducing the current gap between the number of 3D displays and the available 3D content. Here, we present an automatic 2D-to-3D image conversion approach based on machine learning principles. Stemming from the hypothesis that images with a similar structure have likely a similar 3D structure, the depth of a query color image is estimated using a color plus...
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...
Zebra fish larvae have become a popular model organism to investigate genetic and environmental factors affecting behavior. However, difficulties exist in the analysis of complex behaviors from a large array of larvae. In this paper, we present the new application of machine learning techniques in bioinformatics to automatically detect and investigate the locomotor activities of zebra fish larvae...
Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous...
This paper proposes a constrained clustering method that is based on a graph-cut problem formalized by SDP (Semi-Definite Programming). Our SDP approach has the advantage of convenient constraint utilization compared with conventional spectral clustering methods. The algorithm starts from a single cluster of a complete dataset and repeatedly selects the largest cluster, which it then divides into...
Feature selection is an effective technique to put the high dimension of data down, which is prevailing in many application domains, such as text categorization and bio-informatics, and can bring many advantages, such as improving efficiency and avoiding over-fitting, to learning algorithms. Currently, many efforts have been attempted in this field and various feature selection methods have been developed...
Applications have emerged in the last years in which several dissimilarities and data sources provide complementary information about the problem. Therefore, metric learning algorithms should be developed that integrate all this information in order to reflect better which is similar for the user and the problem at hand. In this paper, we propose a semi-supervised algorithm to learn a linear combination...
In this paper, we apply TDT technology to the vertical search engine in the financial field. The returned results are grouped into several topics with the stock as the unit. Then we show the topics to the users in time series order. As a result, users can easily learn about the important events which belong to a stock. Moreover, the causes and the effects of these events can also be found out easily...
In the last decade, there has been a growing interest in distance function learning for semi-supervised clustering settings. In addition to the earlier methods that learn Mahalanobis metrics (or equivalently, linear transformations), some nonlinear metric learning methods have also been recently introduced. However, these methods either allow limited choice of distance metrics yielding limited flexibility...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.