The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection is the process of selecting a subset of relevant features from the larger set of collected features. As the amount of available data grows with technology, feature selection becomes a more important part of the system-design process. In real-world applications, there are several costs associated with the collection, processing, and storage of data. Given that these costs can vary...
The rapidly increasing availability of healthcare data from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, and patient management. The patient healthcare data are usually longitudinal and can be expressed as medical event sequences, where the events include clinical diagnosis, medications, laboratory reports, etc...
There has been a surge in research interest in learning feature representation of networks in recent times. Researchers, motivated by the recent successes of embeddings in natural language processing and advances in deep learning, have explored various means for network embedding. Network embedding is useful as it can exploit off-the-shelf machine learning algorithms for network mining tasks like...
Deep learning algorithms have recently produced state-of-the-art accuracy in many classification tasks, but this success is typically dependent on access to many annotated training examples. For domains without such data, an attractive alternative is to train models with light, or distant supervision. In this paper, we introduce a deep neural network for the Learning from Label Proportion (LLP) setting,...
An individual's personality determines the probable repertoire of their reactions to a particular situation. A social robot is much more effective if it is able to learn and so take into account the properties of the humans around it, including personalities. We investigate how well personality can be estimated based on modest amounts of speech or writing, which a social robot might (over)hear. Such...
Image annotation methods construct a Tag distance matrix, which entries show the relevancy of tags for each test image. More accuracy in calculating this matrix provides better annotation results. The aim of our two methods is to improve the accuracy of the Tag distance matrix using the class information already available in most datasets. If the class information is not available, extracting important...
Many real-world datasets suffer from the problem of missing values. Imputation which replaces missing values with plausible values is a major method for classification with data containing missing values. However, powerful imputation methods including multiple imputation are usually computationally intensive for estimating missing values in unseen incomplete instances. Rule-based classification algorithms...
Maritime traffic prediction is critical for ocean transportation safety management. In this paper, we propose a novel knowledge assisted methodology for maritime traffic forecasting based on a vessel’s waterway pattern and motion behavior. The vessel’s waterway pattern is extracted through a proposed lattice-based DBSCAN algorithm that significantly reduces the problem scale, and its motion behavior...
Objective: Medical data mining is a research hotspot. But medical data often contains missing values, which brings difficulties to the medical data analysis. This work evaluates the performance of several imputation methods. Methods: In this paper, we first simulate the missing data set by completely deleting some data from the complete data set, and use the Euclidean distance KNN, the correlation...
The unified Parkinson's disease rating scale (UPDRS) is the most widely employed scale for tracking Parkinson's disease (PD) symptom progression. However, conventional way to achieve UPDRS, mainly based on the physical examinations of clinic patients performed by the trained medical staffs, involves the disadvantages of inconvenience and high medical expense. Hence, in this study, we try to explore...
The reversible data hiding is an emerging technology that uses the redundancy of the carrier (typically digital images) to embed secret information and ensure the reversibility of the carrier and hidden information. In recent year, a number of reversible data hiding algorithms based on prediction error expansion have been developed. In prediction error expansion, prediction on the center pixel is...
In data classification mining, the decision tree method is a key algorithm. ID3 (Iterative Dichotomiser 3) algorithm which was presented by Quinlan is a famous decision tree algorithms, but ID3 has some shortcomings such as high complex computation in computing the information entropy expression, multivalue bios problem in the process of selecting an optimal attribute, large scales, etc. In order...
This paper aims to build data mining model to predict the performance of candidate teachers who apply for employment in education of high schools of Gaza Strip. We apply three classification algorithms on our dataset which are Decision Tree, Naïve Bays and KNN. Our dataset contains 8000 teacher records collected from ministry of education in Gaza Strip. Although there are a lot of researchers...
The paper exposes the behavior of the Decision Trees (DT) algorithms on a big database with many cases and many attributes: Forest Covertype (FC) from UCI Knowledge Discovery in Databases Archive. In classification experiments considered have been taken into account 22 splitting criteria and two pruning methods whose performances were presented in terms of classification error rate on test data, data...
One of the major causes of death in the world is Heart Failure. This disease affects directly the heart's pumping job. Because of this perturbation, nutriments and oxygen are not well circulated and distributed. The New York Heart Association has classified this disease into four different classes based on patient symptoms. In this paper, we are using a data mining technique, more precisely a sequential...
The latest video compression standards, such as the H.264/AVC and the High Efficiency Video Coding (HEVC), provide fast Motion Estimation (ME) algorithms in their reference software aiming at complexity reduction. Test Zone Search (TZS) is the state-of-the-art fast ME algorithm, currently deployed in the reference HEVC encoder due to its great coding efficiency. However, ME is still one of the main...
How to reduce the computation time and how to improve the quality of the clustering result are the two major research issues. Although several efficient and effective clustering algorithms have been presented, none of which is perfect. As such, an effective clustering algorithm, which is based on the prediction of searching information to determine the search directions at later iterations and employs...
The accurate short-term traffic flow prediction can provide timely and accurate traffic condition information which can help one to make travel decision and mitigate the traffic jam. Deep learning (DL) provides a new paradigm for the analysis of big data generated by the urban daily traffic. In this paper, we propose a novel end-to-end deep learning architecture which consists of two modules. We combine...
DNA Microarray data is a high-dimensional data that enables the researchers to analyze the expression of many genes in a single reaction quickly and in an efficient manner. Its characteristics such as small sample size, class imbalance, and data complexity causes it difficult to classified. Feature selection is a process that automatically selects features that are most relevant to the predictive...
Sequential pattern mining is a data mining technique that aims to extract and analyze frequent subsequences from sequences of events or items with time constraint. Sequence data mining was introduced in 1995 with the well-known Apriori algorithm. The algorithm studied the transactions through time, in order to extract frequent patterns from the sequences of products related to a customer. Later, this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.