The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays many researches have focused on the Affinity Propagation (AP) algorithm for community detection for its advantages of near-linear complexity and no prerequisite for any object function or cluster number. In view of different influences from common neighbors, we propose an improved Affinity Propagation algorithm which is based on adjacency matrix and considers self-similarity and similarity...
This paper considers the two-parallel-machine scheduling problem with precedence constraints. One of the machines may not always be available due to machine breakdowns or preventive maintenance during the scheduling period. All execution and communication times between tasks are considered unitary and all unavailability dates are known in advance. The considered task graph (since tasks are related...
Uncertain data clustering is an essential task in the research of data mining. Lots of traditional clustering methods are extended with new similarity measurements to tackle this issue. Different from certain data clustering, uncertain data clustering focus more on the evaluation of distribution similarity between uncertain data objects. In this paper, based on the KL-divergence and the JS-divergence,...
In this paper, a novel autonomous data-driven clustering approach, called AD_clustering, is presented for live data streams processing. This newly proposed algorithm is a fully unsupervised approach and entirely based on the data samples and their ensemble properties, in the sense that there is no need for user-predefined or problem-specific assumptions and parameters, which is a problem most of the...
Traditional graph based sentence ranking approaches modeled the documents as a text graph where vertices represent sentences and edges represent pairwise similarity relationships between two sentences. Such modeling cannot capture complex group relationships shared among multiple sentences which can be useful for sentence ranking. In this paper, we propose two different group relationships (sentence-topic...
In a multi-view application, view synthesis is required to synthesize unavailable views using texture and depth information of the given views. In this paper, a compensation algorithm based on template matching for partial recovery of disocclusion regions is proposed. The shape of a template is dynamically adjusted, and a new reference mode which mimics pushing dominoes is also proposed. To find an...
Hadoop distributed file system (HDFS) is a major distributed file system for commodity clusters and cloud computing. Its extensive scalability and replica fault tolerance scheme makes it well suited for data-intensive application. Due to the tremendous growth of data, many computation-centric applications also become data-intensive. However, they are not optimal on HDFS, which leaves plenty of space...
By considering the fuzzy collaborative clustering, in this paper we investigate the ranking problem of factor granules, where the factor granule is composed by the patterns, the factors and the factor-induced information. Since the ideology of TOPSIS method is applied to obtain the finial ranking result, a referential factor granule is pre-provided. The collaborative information, i.e., the partition...
Clustering is an unsupervised data mining tool and in bioinformatics, clustering genome sequences is used to group related biological sequences when there is no additional supervision. Sequence clusters are often related with gene/protein families, which can shed some light onto determining tertiary structures. To extract such hidden and valuable structures in a data set of genome sequences can benefit...
This paper presents an analysis using hierarchical grouping method (ward grouping) and the non-hierarchical grouping method (k-means) to analyze the participation levels in activities and interactions in a virtual forum. Data came from a MOOC and it was on grammatical rules of Brazilian Portuguese. About 5100 participants integrated the course. It lasted about three months and the activities developed...
Data Mining is the technique used to visualize and scrutinize the data and drive some useful information from that data so that information can be used to perform any useful work. So clustering is the one of the technique that has been proposed to be used in the area of data mining The notion behind clustering is to assigning objects to cluster based upon some customary characteristics such that object...
Recently, social event recommendation, which is to recommend a list of upcoming events to a user, has attracted a lot of research interests. In this paper, we first construct a heterogeneous graph to express the interactions among different types of entities in event-based social network. Based on the constructed graph, we propose a novel recommendation algorithm called reverse random walk with restart...
Word Classification involves grouping the words in a document into clusters. Clustering data sets is a much researched problem. In 2005, Nikhil R. Pal, Kuhu Pal, James M. Keller, and James C. Bezdek proposed A Possibilistic Fuzzy c-Means (PFCM) Clustering Algorithm. The PFCM model gives the membership values and the typicality values, along with the cluster centers. It is a hybrid algorithm of possibilistic...
Investigating insider threat cases is challenging because activities are conducted with legitimate access that makes distinguishing malicious activities from normal activities difficult. To assist with identifying non-normal activities, we propose using two types of pattern discovery to identify a person's behavioral patterns in network data. The behavioral patterns serve to deemphasize normal behavior...
In this paper, we propose a new approach to fuzzy data clustering. We present a new algorithm, called TEDA-Cloud, based on the recently introduced TEDA approach to outlier detection. TEDA-Cloud is a statistical method based on the concepts of typicality and eccentricity able to group similar data observations. Instead of the traditional concept of clusters, the data is grouped in the form of granular...
Amount and diversity of data produced and processed has been dramatically increased parallel to improvements in technology. Unfortunately produced data usually don't have any labels which may make the classification and building information process more easily. This resulted with higher importance on data clustering for builing information. In this work K-Means, Spectral Clustering and Girvan-Newman...
Hadoop has two components namely HDFS and MapReduce. Hadoop stores user data based on space utilization of datanodes on the cluster rather than the processing capability of the datanodes. Furthermore Hadoop runs in a heterogeneous environment as all datanodes may not be homogeneous. For these reasons, workload imbalances will occur when jobs run in a Hadoop cluster resulting in poor performance. In...
Graph structures are often used for representingdata object and link between them in large datasets. Knowledge extraction from these data relies on finding the connected components within these graphs. Given a large graph G = (V, E), where V is the set of vertices and E is the set of edges, the problem is to find the connected components efficiently. The problem offinding the connected components...
A customized bus (CB) system is a new emerging public transportation that provides flexible demand-oriented transit services for city commuters. Existing CB systems encounter two challenges of 1) collecting travel demands and discovering travel patterns effectively and efficiently and 2) planning profitable bus lines based on travel patterns. In this paper, we propose a bus line planning framework,...
For multi-homed networks, inter-domain traffic engineering (TE) consists in selecting the best path via available transit providers, so that the transmission quality is improved in front of network events, such as congestion and fail-over. In practice, this choice bases on end-to-end (e2e) measurements toward destination networks. These measurements, especially Round-Trip Time (RTT), are expected...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.