The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many colleges have accumulated a large amount of information, such as achievement data and consumption records. According to the above information, we attempt to identify the student group from various aspects. Given this, we can acquire the characteristics of students in different groups. In this way, the college can have a better understanding of students to accomplish the reasonable management...
The imbalanced learning problem is becoming pervasive in today's data mining applications. This problem refers to the uneven distribution of instances among the classes which poses difficulty in the classification of rare instances. Several undersampling as well as oversampling methods were proposed to deal with such imbalance. Many undersampling techniques do not consider distribution of information...
Iterative SpMV (ISpMV) is a key operation in many graph-based data mining algorithms and machine learning algorithms. Along with the development of big data, the matrices can be so large, perhaps billion-scale, that the SpMV can not be implemented in a single computer. Therefore, it is a challenging issue to implement and optimize SpMV for large-scale data sets. In this paper, we used an in-memory...
Data mining has gained much importance in the field of research these days. It makes perfect blend for analyzing data of any fields and provide decision based output. Data generation and storage these days are done at high speed. Non stationary systems play holistic role in providing such data. Availability of such data creates scope of analysis for researchers. Such data which are continuous, unbounded,...
Nowadays, large volumes of data and measurements are being continuously generated by computer and telecommunication networks, but such volumes make it difficult to extract meaningful knowledge from them. This paper presents SaFe-NeC, an innovative methodology for analyzing network traffic by exploiting data mining techniques, i.e. clustering and classification algorithms, focusing on self-learning...
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process...
In view of today's information available, recent progress in data mining research has lead to the development of various efficient methods for mining interesting patterns in large databases. It plays a vital role in knowledge discovery process by analyzing the huge data from various sources and summarizing it into useful information. It is helpful for analyzing the volumes of data in different domains...
Financial stock Data Analysis and future prediction in terms of Sentiments is great challenge in the big data research. Among the unlabelled opinion, opinion classification in terms of unsupervised learning algorithm will lead to classification error as data is sparse and high dimensional. To overcome this problem, the sentiment analysis to extract the opinion of each word in the stock data has been...
Recent advances in using computer with different fields of sciences produced huge amounts of data. These data represent as an analysis tool and key to overcome many problems. Clustering is a primary process to analyze the data as well as, it's a preprocessing step before other techniques like classification. Density-Based clustering algorithms have advantages like clustering any arbitrary shapes and...
Outlier detection is an important issue in the realm of data mining. Several applications relay on outlier detection such as intrusion detection, fraud detection, medical and public health data, image processing, etc. Clustering-based outlier detection algorithms are considered as the most important outlier detection approaches. They provide high detection rate, however, they suffer from high false...
With the phenomenal increase in digital data, it is inefficient to run the traditional clustering algorithms on separate servers. To deal with this problem, researchers are migrating to distribute environment to implement the traditional clustering algorithms, more specifically K-means clustering. In traditional K Means Clustering, the problem of instability caused by the random initial centers exists...
With the development of digital cable interactive business and the diversification of the customers' demand, grouping TV programmes based on preferences of users effectively is vital for market segmentation and differentiation. The study summarizes the main principle and characteristic of clustering algorithm, and uses K-Means algorithm to show TV programmes preference grouping based on 52392 subscribers...
K-means is one of the most significant clustering algorithms in data mining. It performs well in many cases, especially in the massive data sets. However, the result of clustering by K-means largely depends upon the initial centers, which makes K-means difficult to reach global optimum. In this paper, we developed a novel algorithm based on finding density peaks to optimize the initial centers for...
The user enters any query to find desired information. To discover number of user search goals and representing each goal with some keyword, we first infer user search goals for a query by clustering feedback sessions. For that, we use a concept of pseudo document, which is the revised version of feedback session. Then the user search goals are determined by clustering the pseudo documents and it...
Data mining is one of the most exciting fields of research for the researcher. As data is getting digitized, systems are getting connected and integrated, scope of data generation and analytics has increased exponentially. Today, most of the systems generate non-stationary data of huge, size, volume, occurrence speed, fast changing etc. these kinds of data are called data streams. One of the most...
Mining based on opinions can extract useful information from users' comments. After doing cluster and analysis on the information, users can get a detailed understanding of the commodity, then determine to buy the commodity or not. In this paper, firstly, we extract evaluation objects and evaluation words, then cluster the evaluation objects. Next based on SO-PMI algorithm, judge the polarity of evaluation...
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...
The most important task of clustering process is the validation of results obtained from clustering algorithms. There are many cluster validation criteria's but the most commonly used approaches are founded on internal validity indices. There are numerous indices that have been suggested from time to time but there are only some of them that have been popularly used. In this paper we have drawn a...
A cluster can be defined as the collection of data objects grouped into the same group which are similar to each other whereas data objects which are different are grouped into different groups. The process of grouping a set objects into classes of similar objects is called clustering. In fuzzy c means clustering, every data point belongs to every cluster by some membership value. Hence, every cluster...
Data normalization for use in Artificial Neural Networks often requires extensive statistical analysis. This paper presents an initial investigation of a case study involving credit card fraud detection, where Cluster Analysis was applied to data normalization. Early results obtained from the use of Artificial Neural Networks and Cluster Analysis on fraud detection has shown that neuronal inputs can...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.