The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Investigating the pattern of host load in computing systems is very useful for discovering the data features and predicting the host load in the future. Since the host load can be regarded as the time series data, this paper proposes a pattern discovery framework for host load data by applying time series analysis methods. In the proposed framework, the effective data representation, data segmentation...
An increasing interest has been recently devoted to clustering short documents. Short documents don't contain enough text to compute similarities accurately by implementing the most widely used technique called Vector Space Model (VSM). Adding semantics to short documents clustering is one efficient way to solve this problem. However, real life collections are often composed of very short or long...
In the age of information explosion, vast amount of information is now available on the internet. Early warning of breaking events is becoming a popular subject to study. Timeliness is the one of the most important factors to be considered in this subject. However, traditional topic detection approaches are always not so effective for the detection of emerging topics which concentrate all the news...
Cluster analysis is an important and challenging subject in time series data mining. It has a very important application prospect in many areas, such as medical images, atmosphere, finance, etc. Many current clustering techniques have still many problems, for example, k-means is a very effective method in finding different shapes and tolerating noise, but its result severely depends on the suitable...
The ever increasing volume of video content has created profound challenges for developing efficient video summarization (VS) techniques to access the data. Recent developments on sparse dictionary selection have demonstrated promising results for VS, however, the convex relaxation based solution cannot ensure the sparsity of the dictionary directly and it selects keyframes in a local point of view...
Via analyzing characters of vast disaster news on the internet, a new topic detection algorithm based on Group Average Hierarchical Clustering (GAHC), which is suitable for the processing of big data on the network, is proposed in this paper. The core idea of GAHC is to divide big data into smaller groups, and then cluster groups hierarchically to generate final topics. During the process of clustering,...
As documents are explosively increasing in the era of big data, document clustering has been proven to be useful for organizing online document streams into events. However, extant studies on document clustering still suffer from the problems of high dimensionality, scalability and accuracy. In this paper, we will present a novel association link network (ALN) based document clustering method, which...
MapReduce has been widely used as a Big Data processing platform. As it gets popular, its scheduling becomes increasingly important. In particular, since many MapReduce applications require real-time data processing, scheduling real time applications in MapReduce environments has become a significant problem. In this paper, we create a novel real-time scheduler for MapReduce, which overcomes the deficiencies...
Both image alignment and image clustering are widely researched with numerous applications in recent years. These two problems are traditionally studied separately. However in many real world applications, both alignment and clustering results are needed. Recent study has shown that alignment and clustering are two highly coupled problems. Thus we try to solve the two problems in a unified framework...
This paper presents an operator of fuzzy clustering method of image segmentation based on Local Binary Pattern (LBP). Semi-supervised learning and fuzzy clustering method are introduced in order to overcome the problem of initial clustering sensitive. Also, local binary pattern operator is introduced to construct the space feature vectors of pixels, which makes full use of the space characteristics...
A new strategy of feature classification method for speaker recognition based on the grid-density clustering is presented. According to the concept of density-based and grid-distance-based distribution in the Mel-frequency cepstrum domain, the feature vectors of each speaker were self-adaptively classified into L clusters with less overlapped. With these convex and non-interwoven clusters, the Gaussian...
In view of today's unprecedented diverse and discrete mass text data processing, this paper presents a distributed MST (minimum spanning tree) algorithm based on MapReduce programming model. And with this MST algorithm, a distributed MST text clustering algorithm is designed and implemented. In this paper, this clustering algorithm is analyzed in three aspects: text feature vector extraction, graph...
Personalized search based on the users' preference has been extensively studied in the field of information retrieval. As a typical representative of web2.0, social tagging not only allows users to better describe and manage web resources, but also provides a great opportunity for the personalized search research, since it contains abundant public personal information. In this paper we propose a user...
The traditional K-means algorithm is sensitive to the initial points and easy to fall into local optimum. To avoid this kind of flaw, an improved GA-based text clustering algorithm CGHCM is proposed. The new algorithm is proven effective to avoid falling into local optimum and obtains better clustering results.
We propose a scheme to reach shorter multicast delay, better energy utilizing efficiency and higher efficiency of data transferring for Sensor Grid. Our scheme calculates the space, energy and data weight vectors in one cluster. Then it searches a new vector composed by the linear combination of the three individual ones. We build game balance equation, use the equal correlation coefficient between...
In the fiber image analysis system, correctly segmenting fiber from fiber micrograph is critical for fiber feature extraction and further identification. In this paper, the GVF snake model with the initial contour obtained by contour tracking method based on K-means clustering segmentation is proposed for fiber segmentation. Firstly, the K-means clustering method is used to obtain the initial coarse...
For massive data transmission in Data Grid supporting radio and wireless, a set of novel hierarchical multicast algorithms is proposed to attain higher efficiency of data transfers. The newly-proposed algorithms first form different clusters, second calculate the space weight vector W' and the data quantity weight vector W'' in very cluster. Then the algorithms try to find a new vector W composed...
For attained high data multicast efficiency for the P2P database system, the paper proposes a set of novel multicast algorithms. In contrast with the current algorithms, the new algorithms firstly divide the group members into different clusters in terms of static delay distance, then find the central node in the clusters, calculate the space weight of every node, search the data quantity of every...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.