The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
For the quality of the wine big data identification technology, the introduction of data mining classification algorithm, effectively according to the content of several impact compounds in wine level identification;Are introduced including the Logistic regression and BP neural network and SVM classification algorithm, in view of the three algorithms identify the modeling analysis of wine quality...
We investigate the problem of churn detection and prediction using sequential cellular network data. We introduce a cleaning and preprocessing of the dataset that makes it suitable for the analysis. We draw a comparison of the churn prediction results from the-state-of-the-art algorithms such as the Gradient Boosting Trees, Random Forests, basic Long Short-Term Memory (LSTM) and Support Vector Machines...
Real-time crash prediction models are playing a key role in transportation information system. Support vector machine (SVM), a classification learning algorithm, was introduced to evaluate real-time crash risk. The size of traffic dataset is always large with a high accumulating speed. By applying a warm start strategy, an incremental learning algorithm is introduced to update the original model....
Now a days people are enjoying the world of data because size and amount of the data has tremendously increased which acts like an invitation to Big data. But some of the classifier techniques like Support Vector Machine (SVM) is not able to handle the huge amount of data due to it's excessive memory requirement and unreasonable complexity in algorithm tough it is one of the most popularly used classifier...
Now a days people are enjoying the world of data because size and amount of the data has tremendously increased which acts like an invitation to Big data. But some of the classifier techniques like Support Vector Machine (SVM) is not able to handle the huge amount of data due to it's excessive memory requirement and unreasonable complexity in algorithm tough it is one of the most popularly used classifier...
Support vector machines (SVMs) with the $\ell _{1}$ -penalty became a standard tool in the analysis of high-dimensional classification problems with sparsity constraints in many applications, including bioinformatics and signal processing. We give non-asymptotic results on the performance of $\ell _{1}$ -SVM in identification of sparse classifiers. We show that an $N$ -dimensional $s$ -sparse...
The recent computing trend is producing tons of data every minutes where the amount of imbalanced data is quite high as far as real life data sets are concerned. In practical aspects of data mining, the imbalanced data set is prone to misguide a data mining model. However, data set needs pre-processing before mining. This work focuses on some practical data mining techniques and produces a valid evaluation...
To retrain an existing multilayer perceptron (MLP) on-line using newly observed data, it is necessary to incorporate the new information while preserving the performance of the network. This is known as the “plasticitystability” problem. For this purpose, we proposed an algorithm for on-line training with guide data (OLTA-GD). OLTA-GD is good for implementation in portable/wearable computing devices...
This paper proposes an approach using MapReduce-based Rocchio relevance feedback algorithm, which improved the traditional Rocchio algorithm in the MapReduce paradigm, to resolve the problem of massive information filtering. Traditional text classification algorithms have vital impact on information filtering.
In least squares support vector machine (LS-SVM), nonlinear function estimation is done by solving a linear set of equations instead of solving a quadratic programming problem, and a non-sparse solution is obtained yet. Several sparse algorithms have been developed to obtain reduced support vectors to improve the generalization performance of LS-SVM. In this paper, we proposed a sparse method based...
With the development of Internet, shopping on the internet is becoming more and more popular. In the meantime, the massive commodity make the experience of online shopping bad. In order to solve this problem, commodity recommendation system is applied into e-commerce platform and the experience of online shopping has been promoted. Essentially, commodity recommendation is a kind of behavior prediction...
The advances of network technology and mobile communication technology are making eHealth possible. In eHealth systems, physiological data and relevant context-aware data are acquired continuously and in real time. At the same time, such large-scale data results in huge challenges in the aspect of real-time big data processing since eHealth data appears in the form of data stream. Therefore, we propose...
In this paper, we focus on the problem of how to design a methodology which can improve the prediction accuracy as well as speed up prediction process for stock market prediction. As market news and stock prices are commonly believed as two important market data sources, we present the design of our stock price prediction model based on those two data sources concurrently. Firstly, in order to get...
For traditional data mining tasks, algorithms are commonly selected by manual effort. However, it is a challenge for any practitioner to select the most appropriate algorithm from hundreds of candidates. To address this issue, we have proposed a novel model for supporting automatic selection on data mining algorithms. The model incorporates the extracted characteristics of data sets and the dynamically...
We present Rough-Fuzzy Support Vector Domain Description (RFSVDD), a novel data description algorithm that provides a rough-fuzzy characterization of a data set and shows its potential for outlier detection. Its resulting data structure is characterized by two components: a crisp lower-approximation and a fuzzy boundary. While the lower-approximation consists of those data points that lie inside the...
Until now, designing a reliable image segmentation algorithm is still an open problem. Research related to this matter is still underway, but in one occasion we may be faced with the problem for selection image segmentation algorithms that will we use? To get the solution of this problem we need a good technical evaluation of image segmentation algorithms. With the technique, it is expected we can...
How to classify the data sets with vast information amount and large distribution fluctuation, which is always the research hotspot. This paper puts forward an improved SVM incremental learning algorithm by comparing the different incremental learning methods of SVM algorithm. In the algorithm, whether to violate the KTT conditions is regarded as an important basis for incremental data set. And the...
Although the perceptron algorithm has been considered a simple supervised learning algorithm, it has the advantage of learning from the training data set one at a time. This makes it more suitable for online learning tasks and new families of kernelized perceptrons have been shown to be effective in handling streaming data. However, the amount of memory required for storing the online model which...
The study aimed at analyzing the keywords of the Macau Special Administrative Region's 2012 and 2013 annual policy addresses. The contribution of the study included the following two points. First, the study used the text mining method in order to explore the content of policy address. Second., the study applied the SVM (Support Vector Machine) and random forests classification analysis to explore...
Traditional classification algorithms are difficult in dealing with imbalance data. This paper proposes a classification algorithm called CascadeBoost, which combines with the advantages of boosting algorithm and cascade model that can learn imbalance data. Cascade model allows the pre-training data to be balanced by gradually reducing the number of the major class; and then the most rich information...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.