The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we introduce three main features extracted from Moodle logs in order to be uses a possible means to predict future student grades. We discuss the statistical analysis on these features and show how they cannot be applied isolatedly to model our data. We then apply them as a whole and use principal component analysis to derive a decision tree based on the features. With derived tree we...
This paper studies the problem of human activity recognition. Traditionally, the data collected by the accelerometer is preprocessed with a fixed time window, and features for human activity recognition model are extracted in this framework. However, some human activities are quasi-periodic, which means that classification accuracy can be improved if adaptive time window is adopted instead. As human...
In the area of recommender systems, user-based collaborative filtering algorithm has been extensively studied and discussed. In the traditional approach of this method, a target user's preference for an item is predicted by the integrated preference of the user's neighbors for the item, ignoring the structure of these neighbors. That is, these neighbors form two distinct groups: some neighbors may...
In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If...
The High Efficiency Video Coding (HEVC) standard provides a large improvement in terms of compression efficiency in comparison to its predecessors, mainly due to the introduction of new coding tools and more flexible data structures. However, since much more options are tested in a Rate-Distortion (R-D) optimization scheme, such improvement is accompanied by a significant increase in the encoding...
A decision tree is a technique of modeling algorithm for solving classification and prediction problems. The data collected on students and their proficiency as input parameters for the study classifies and predicts outcome for students who will take the exams in the future. The study establishes a correlation between cases as a relationship or correlation between different phenomena represented by...
With rapid increase of scope, coverage and volume of geographic datasets, knowledge discovery from spatial data have drawn a lot of research interest for last few decades. Traditional analytical techniques cannot easily discover new, implicit patterns, and relationships that are hidden into geographic datasets. The principle of this work is to evaluate the performance of traditional and spatial data...
Identifying review manipulation has become one of hot research issues in e-commerce because more and more customers make their purchase decisions based on some personal comments from virtual communities and e-business websites. Customers consider these personal reviews are more reliable than the existing internet advertisements. Consequently, some enterprises attempt to create fake personal comments...
This research proposes a manufacturing data analysis framework in the form of computing blocks. The aim is to identify the parameters/attributes that affect the production yield as well as the root cause of the manufacturing problems. The framework is designed to be flexible and exploit the cloud as a computing platform. The manufacturing data are obtained from the database of the production lines...
Defects in every software must be handled properly, and the number of defects directly reflects the quality of a software. In recent years, researchers have applied data mining and machine learning methods to predicting software defects. However, in their studies, the method in which the machine learning models are directly adopted may not be precise enough. Optimizing the machine learning models...
Intruder is one of the most publicized threats to security. In recent years, intrusion detection has emerged as an important technique for network security. Data mining techniques have been applied as a new approach for intrusion detection. The quality of the feature selection methods is one of the important factors that affect the effectiveness of Intrusion Detection system (IDS). This paper evaluates...
This paper examines a correlation between Croatian equity index CROBEX's value change and Croatian open investment funds share price change by applying data mining techniques. Apart from the type of the fund, in consideration is taken the fact how much funds invest in Croatian equities that is the equities which rate on Zagreb Stock Exchange. While working, the problem was observed in two aspects,...
The paper describes ongoing research in data mining techniques investigated for modelling seasonal climate effects on grapevine phenology that determines the ratio of grape berry composition that in turn determines the fineness of wine vintage in addition to winemaker experience and talent. A brief introduction to the literature in this problem domain is followed by a discussion on conventional statistical...
There is no general consensus on which classifier performance metrics are better to use as compared to others. While some studies investigate a handful of such metrics in a comparative fashion, an evaluation of specific relationships among a large set of commonly-used performance metrics is much needed in the data mining and machine learning community. This study provides a unique insight into the...
In this paper, a new decision tree construction algorithm (MIDT) is proposed. MIDT (Multiple Informative Decision Tree) uses principal component analysis to integrate information gain, samples distribution information and correlation coefficient as the basis of the selection of splitting attributes. This method can overcome the disadvantage of ID3 decision tree construction method that uses information...
Parametric models such as linear regression can contribute valuable, interpretable descriptions of simple structure in data. However, occasionally such simple structure does not extend across an entire database and might be confined more locally within subsets of the data. Nonparametric regression normally involves local averaging. In this study, local averaging estimator is coupled with a machine...
Change information is a kind important signal in the remote sensing image, and automatically detecting it has become important field of intelligent interpreting the Remote Sensing Image. Guided by the theory of neighboring correlation and decision tree and based on the object-oriented technique, we propose a method that can detect the change information of two kinds of high remote sensing images that...
A plethora of defect prediction models has been proposed and empirically evaluated, often using standard classification performance measures. In this paper, we explore defect prediction models for a large, multi-release software system from the telecommunications domain. A history of roughly 3 years is analyzed to extract process and static code metrics that are used to build several defect prediction...
K-means is a widely-used clustering algorithm in data mining. In traditional algorithm, each feature is treated equally and each one gives the same contribution to K-means. In fact, redundant and irrelevant features may disturb the clustering result. This paper proposes a improved K-means algorithm based on a fuzzy feature selection strategy. The method is based on measuring 'feature important factor'...
Data mining technique can extract desired knowledge from existing databases and ease knowledge acquisition bottleneck of fault diagnosis. A new knowledge acquisition method combined of decision tree and rough set theory is thus proposed for fault diagnosis in this paper. Based on the reduction by rough set theory, decision tree extract diagnostic knowledge from the reduced decision tables in the form...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.