The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present an intelligent, state-of-the-art, mobile-based transportation system called SAFAR (Safe and Fast around the Road), which provides dynamic information to Karachi bus commuters concerning any type of violence incident which has occurred farther ahead from their current location on the current bus route. Using named entity recognition techniques, we have trained SAFAR to recognize...
This paper addresses a problem in which we learn a regression model from sets of training data. Each of the sets has an only single label, and only one of the training data in the set reflects the label. This is particularly the case when the label is attached to a group of data, such as time-series data. The label is not attached to the point of the sequence but rather attached to particular time...
This article presents our recent study of a lightweight Deep Convolutional Neural Network (DCNN) architecture for document image classification. Here, we concentrated on training of a committee of generalized, compact and powerful base DCNNs. A support vector machine (SVM) is used to combine the outputs of individual DCNNs. The main novelty of the present study is introduction of supervised layerwise...
Learning effective and efficient classifiers for imbalanced data is one of ten challenge problems in data mining research. Studying classifiers for imbalanced data is a popular area in machine learning and data mining, which also has great significance in many areas, such as cancer diagnose, credit card fraud detection and intrusion detection. The study for imbalanced data classification can be divided...
Class imbalance is an issue in many real world applications because classification algorithms tend to misclassify instances from the class of interest when its training samples are outnumbered by those of other classes. Several variations of AdaBoost ensemble method have been proposed in literature to learn from imbalanced data based on re-sampling. However, their loss factor is based on standard...
This paper addresses the problem of object counting, which is to estimate the number of objects of interest from an input observation. We formalize the problem as a posterior inference of the count by introducing a particular type of Gaussian mixture for the input observation, whose mixture indexes correspond to the count. Unlike existing approaches in image analysis, which typically perform explicit...
Visualization helps us to understand single-label and multi-label classification problems. In this paper, we show several standard techniques for simultaneous visualization of samples, features and multi-classes on the basis of linear regression and matrix factorization. The experiment with two real-life multi-label datasets showed that such techniques are effective to know how labels are correlated...
In this paper we show that weighted K-Nearest Neighbor, a variation of the classic K-Nearest Neighbor, can be reinterpreted from a classifier combining perspective, specifically as a fixed combiner rule, the sum rule. Subsequently, we experimentally demonstrate that it can be rather beneficial to consider other combining schemes as well. In particular, we focus on trained combiners and illustrate...
Twitter is attracting significant interests from the research community in the last few years. Sentiment analysis of tweets is among the hottest topics of research nowadays. State of the art approaches of sentiment analysis present many shortcomings when classifying tweets, in particular when the classification goes beyond the binary or ternary classification. Multi-class sentiment analysis has proven...
Predictive coding, once used in only a small fraction of legal and business matters, is now widely deployed to quickly cull through increasingly vast amounts of data and reduce the need for costly and inefficient human document review. Previously, the sole front-end input used to create a predictive model was the exemplar documents (training data) chosen by subject-matter experts. Many predictive...
Many classification tasks target high-level concepts that can be decomposed into a hierarchy of finer-grained sub-concepts. For example, some string entities that are Locations are also Attractions, some Attractions are Museums, etc. Such hierarchies are common in named entity recognition (NER), document classification, and biological sequence analysis. We present a new approach for learning hierarchically...
In ensemble learning, ensemble pruning is a procedure that aims at removing the unnecessary base classifiers and retaining the best subset of the base classifiers. We presented a two-step ensemble pruning framework, in which the optimal size of the pruned ensemble is first decided, and then with the optimal size as input, the optimal ensemble is selected. For the first step to find the optimal ensemble...
Up-to-date maps of installed solar photovoltaic panels are a critical input for policy and financial assessment of solar distributed generation. However, such maps for large areas are not available. With high coverage and low cost, aerial images enable large-scale mapping, but it is highly difficult to automatically identify solar panels from images, which are small objects with varying appearances...
Deep convolutional networks have achieved successful performance in data mining field. However, training large networks still remains a challenge, as the training data may be insufficient and the model can easily get overfitted. Hence the training process is usually combined with a model regularization. Typical regularizers include weight decay, Dropout, etc. In this paper, we propose a novel regularizer,...
Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly...
Handwritten digit recognition is a typical image classification problem. Convolutional neural networks, also known as ConvNets, are powerful classification models for such tasks. As different languages have different styles and shapes of their numeral digits, accuracy rates of the models vary from each other and from language to language. However, unsupervised pre-training in such situation has shown...
This paper proposes a hybrid deep learning algorithm, namely, the Deep Boltzmann Functional Link Network (DBFLN) for classification problems. A Deep Boltzmann Machine (DBM) with two layers of Restricted Boltzmann Machine is the generative model that is used to generate stochastic features and input weights for the discriminative model. A discriminative Functional Link Network (FLN) uses these features...
Dendrite morphological neurons are a type of artificial neural network that work with min and max operators instead of algebraic products. These morphological operators allow each dendrite to build a hyper-box in classification N-dimensional space. In contrast with classical perceptrons, these simple geometrical representations, hyper-boxes, allow the proposal of training methods based on heuristics...
A social spammer detection model based on tri-training (SSDTT) is adopted. The main procedure of the work is: First, train three original classifiers with a small amount of labeled data. Then, select confident users that are labeled for a classifier if the other two classifiers agree on the labeling as new training data. Afterwards, repeat these steps until three classifiers are not updated. Experimental...
Botnets represent one of the most destructive cybersecurity threats. Given the evolution of the structures and protocols botnets use, many machine learning approaches have been proposed for botnet analysis and detection. In the literature, intrusion and anomaly detection systems based on unsupervised learning techniques showed promising performances. In this paper, we investigate the capability of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.