The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Immense databases may contain critical instances or chunks-a small heap of records or instances which has domain specific information. These chunks of information are useful in future decision making for improving classification accuracy for labeling of critical, unlabeled instances by reducing false positives and false negatives. Classification process may be assessed based on efficiency and effectiveness...
Feature selection or variable reduction is a fundamental problem in data mining, refers to the process of identifying the few most important features for application of a learning algorithm. The best subset contains the minimum number of dimensions retaining a suitably high accuracy on classifier in representing the original features. The objective of the proposed approach is to reduce the number...
In various domains, big data play crucial and related processes because of the latest developments in the digital planet. Such irrepressible data growth has led to bring clustering algorithms to segment the data into small sets to perform associated processes with them. However, the challenge continues in dealing with large data, because most of the algorithms are compatible only with small data....
Classification of data points in a data stream is a fundamentally different set of challenges than data mining on static data. While streaming data is often placed into the context of "Big Data" (or more specifically "Fast Data") wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the...
With the growth of the Internet community, textual data has proven to be the main tool of communication in human-machine and human-human interaction. This communication is constantly evolving towards the goal of making it as human and real as possible. One way of humanizing such interaction is to provide a framework that can recognize the emotions present in the communication or the emotions of the...
Missing data imputation is an important task in cases where it is crucial to use all available data and no discard records with missing values. However, most of the existing algorithms are focused on missing at random (MAR) or missing completely at random (MCAR). In this paper, an information decomposition imputation (IDIM) algorithm using fuzzy membership function is proposed for addressing the missing...
The traditional K-means algorithm is sensitive to the initial center, and equates the importance of dimension data for multidimensional data. So it is unable to block the effects of dimensional data dimension, nor can it well reflect the influence of each dimension of clustering. The semi-supervised clustering introduces a small amount of sample points, so that it can significantly reduce the number...
The widespread adoption of ubiquitous devices does not only facilitate the connection of billions of people, but has also fuelled a culture of sharing rich, high resolution locations through check-ins. Despite the profusion of GPS and WiFi driven location prediction techniques, the sparse and random nature of check-in data generation have ushered diverse problems, which have prompted the prediction...
Sequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for item sets, this work is the first to extend it to sequences. We...
Recent years have witnessed the explosive growth of recommender systems in various exciting application domains such as electronic commerce, social networking, and location-based services. A great many algorithms have been proposed to improve the accuracy of recommendation, but until recently the long tail problem rising from inadequate recommendation of niche items is recognized as a real challenge...
Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can...
Most existing topic models focus either on extracting static topic-sentiment conjunctions or topic-wise evolution over time leaving out topic-sentiment dynamics and missing the opportunity to provide a more in-depth analysis of textual data. In this paper, we propose an LDA-based topic model for analyzing topic-sentiment evolution over time by modeling time jointly with topics and sentiments. We derive...
Many real-world networks are featured with dynamic changes, such as new nodes and edges, and modification of the node content. Because changes are continuously introduced to the network in a streaming fashion, we refer to such dynamic networks as streaming networks. In this paper, we propose a new classification method for streaming networks, namely streaming network node classification (SNOC). For...
User reported experiences and opinions are used by peers to make decisions about where to go and what to buy. Unfortunately, not all users or opinions are honest. Many opinions are fabricated and may be submitted by automated systems or by people who are recruited by businesses and search engine optimizers to write good reviews. Such reviews and ratings are called spam reviews. These are misleading...
Databases in clinical scenario have tremendous amount of data regarding patients and clinical history associated. Here, data mining plays vital role in searching for patterns within huge clinical data that could provide useful basis of knowledge for efficient and effective decision-making. Classification mechanism is widely used tool of data mining employed in healthcare applications to facilitate...
For classification of High Dimensional data, feature selection is the most important step for obtaining optimal result with respect to processing power required and time taken. Feature selection is a method by which the most relevant feature is selected from a set of features containing redundant and irrelevant features thereby reducing the load on the classification algorithm. This paper proposes...
Stream mining has gained popularity in recent years due to the availability of numerous data streams from sources such as social media and sensor networks. Data mining on such continuous streams possess a variety of challenges including concept drift and unbounded stream length. Traditional data mining approaches to these problems have difficulty incorporating relational domain knowledge and feature...
Data mining concepts have been extensively used for disease prediction in the medical field. Many Hybrid Prediction Models (HPM) have been proposed and implemented in this area, however, there is always a need for increasing accuracy and efficiency. The existing methods take into account all the features to build the classifier model thus reducing the accuracy and increasing the overall processing...
Some recent studies have suggested that public opinions expressed in social media may be correlated with various social issues. To find out what actually can be discovered in social media data, we need data mining. Data mining approaches that can handle massive amount of data have recently been referred to as big data algorithms. In this paper, we propose a big data algorithm to handling Twitter data...
Since four decades, a sincere concern has aroused among managerial, professional, towards the satisfaction of teaching-learning objective in Academia. Huge span of time has already been spent revealing student's profile patterns using predictive modeling methods, however, very little effort is put up in identifying the causative features responsible for varied students' performances followed by decisive...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.