The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Lately, multi-label classification (MLC) problems have drawn a lot of attention in a wide range of fields including medical, web, and entertainment. The scale and the diversity of MLC problems is much larger than single-label classification problems. Especially we have to face all possible combinations of labels. To solve MLC problems more efficiently, we focus on three kinds of locality hidden in...
In clustering applications, multiple views of the data are often available. Although clustering could be done within each view independently, exploiting information across views is promising to gain clustering accuracy improvement. A common assumption in the field of multi-view learning is that the clustering results from multiple views should be consistent with a latent clustering. However, the potential...
This paper defines the problem and design of the appropriate similarity with distribution function of the omics data is a critical objective. Data mining integrate methodical section at the large explosion of huge amount data that can be obtained to utilize and innovative knowledge. Researchers present and future the omics technologies permit to imitate as highly dimensional of omics data. This paper...
Data sets are the backbone for data mining and knowledge engineering field. The class imbalance problem exists in many real-time data sets. In this paper we investigate the existing approaches for class imbalance problem in the context of classification and ordinal classification. In particular, this investigation extends the study of issues in ordinal classification with respect to the data set and...
In this article a new strategy for single-image super-resolution is proposed. A selective sparse coding strategy based on patch sharpness is assumed to be invariant for patch resolution. This sharpness criterion is used at training stage to classify image patches into different clusters. It is suggested that the use of coupled dictionary learning, with a mapping function can improve the representation...
Multi-label learning, where each instance is assigned to multiple categories simultaneously, is a prevalent problem in data analysis. Previous study approaches typically learn from multi-label data by employing the original feature space in the discrimination process of all class labels. However, this traditional strategy might be suboptimal as the original feature space exists irrelevant or redundant...
Domain adaption tends to transfer knowledge across domains following dissimilar distribution and where target domain has inadequate labelled samples. When knowledge is transferred from abundantly irrelevant sources negative transfer may occur resulting in poor classification of test samples. Deep learning research illustrates the semantic clustering as well as transferability of deep convolutional...
Distributed machine learning is becoming increasingly popular for large scale data mining on large scale cluster. To mitigate the interference of straggler machines, recent distributed machine learning systems support flexible model consistency, which allows worker using a local stale model to compute model update without waiting for the newest model, while limiting the asynchronous step in a certain...
Utilization of machine learning algorithms in time-series data analysis is crucial to effective decision making in today's dynamic and competitive environment. One data type of growing interest is the electricity consumer load profile (LP) data. Owing to advances in the smart grid, immense amount of LP data became available to policymakers as potential to improving the electricity sector. Due to the...
In this paper, we present a dynamic clustering algorithm that efficiently deals with data streams and achieves several important properties which are not generally found together in the same algorithm. The dynamic clustering algorithm operates online in two different time-scale stages, a fast distance-based stage that generates micro-clusters and a density-based stage that groups the micro-clusters...
In this paper, we present Bengali word embeddings and it's application in the classification of news documents. Word embeddings are multi-dimensional vectors that can be created by exploiting the linguistic context of the words in large corpus. To generate the embeddings, we collected Bengali news document of last five years from the major daily newspapers. Word embeddings are generated using the...
To retrain an existing multilayer perceptron (MLP) on-line using newly observed data, it is necessary to incorporate the new information while preserving the performance of the network. This is known as the “plasticitystability” problem. For this purpose, we proposed an algorithm for on-line training with guide data (OLTA-GD). OLTA-GD is good for implementation in portable/wearable computing devices...
Self-Organizing Maps (SOMs) are unsupervised neural networks that build data models. Neuron labeling attaches descriptive textual labels to the neurons making up a SOM, and is an important component of SOM-based exploratory data analysis (EDA) and data mining (DM). Several neuron labeling approaches tend to leave some neurons unlabeled. The interaction between unlabeled neurons and SOM model accuracy...
Clustering, the process of grouping unlabelled data, is an important task in data analysis. It is regarded as one of the most difficult tasks due to the large search space that must be explored. Feature selection is commonly used to reduce the size of a search space, and evolutionary computation (EC) is a group of techniques which are known to give good solutions to difficult problems such as clustering...
User-generated content on online social media (OSM) has several data mining applications, such as extracting useful information during disaster events. Since popular / important content is often re-posted by multiple people on OSM, identifying duplicate content is an important first step in many data mining applications. In this work, we develop a methodology to identify near-duplicate images posted...
Extreme learning machine (ELM) is based on single layer feed forward neural networks (SLFNs) and has become a rapidly developing learning technology today. Recently developed Multilayer form of ELM called ML-ELM which is based on the architecture of deep learning, become more popular compared to other traditional classifiers because of its important qualities such as multiple non-linear transformation...
Classification of imbalanced datasets has become one of the most challenging problems in big data mining. Because the number of positive samples is far less than the negative samples, low accuracy and poor generalization performance and some other defects always go with learning process of traditional algorithms. Ensemble construction algorithm is an important method to handle this problem. Especially,...
In order to make the computer own the knowledge about Chinese vehicle license plate segmentation and recognition, the paper put forward a set of algorithms about license plate segmentation and recognition. The algorithms are divided into four parts: image preprocessing, license plate location, license plate segmentation and character recognition. The aim of image preprocessing is quickly and easily...
Decision making is an important component in a speaker verification system. For the conventional GMM-UBM architecture, the decision is usually conducted based on the log likelihood ratio of the test utterance against the GMM of the claimed speaker and the UBM. This single-score decision is simple but tends to be sensitive to the complex variations in speech signals (e.g. text content, channel, speaking...
How can we recognise social roles of people, given a completely unlabelled social network? We may train a role classification algorithm on another dataset, but then that dataset may have largely different values of its features, for instance, the degrees in the other network may be distributed in a completely different way than in the first network. Thus, a way to transfer the features of different...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.