The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hierarchical Multi-label Classification (HMC) is a challenging real-world problem that naturally emerges in several areas. This work proposes two new algorithms using a Probabilistic Graphical Model based on Dependency Networks (DN) to solve the HMC problem of classifying gene functions into pre-established class hierarchies. DNs are especially attractive for their capability of using traditional,...
Clustering is an unsupervised classification method where objects in the unlabeled data set are classified on the basis of some similarity measure. The conventional partitional clustering algorithms, e.g., K-Means, K-Medoids have several disadvantages such as the final solution is dependent on initial solution, they easily stuck into local optima. The nature inspired population based global search...
The paper proposes a modified version of Differential Evolution (DE) algorithm and optimization criterion function for extractive text summarization applications. Cosine Similarity measure has been used to cluster similar sentences based on a proposed criterion function designed for the text summarization problem, and important sentences from each cluster are selected to generate a summary of the...
Power consumption in a digital circuit increases significantly during test mode. The paper proposes a novel technique to minimize the peak power by circuit clustering based on distribution of energy among the scan cells. All the clusters are equally compatible with respect to the number of scan cells and total system energy which is equally divided among the clusters. The final energy of the system...
Investigating the pattern of host load in computing systems is very useful for discovering the data features and predicting the host load in the future. Since the host load can be regarded as the time series data, this paper proposes a pattern discovery framework for host load data by applying time series analysis methods. In the proposed framework, the effective data representation, data segmentation...
Benefiting from its openness, collaboration and real-time features, Micro blog has become one of the most important news communication media in modern society. However, it is also filled with fake news. Without verification, such information could spread promptly through social network and result in serious consequences. To evaluate news credibility on Micro blog, we propose a hierarchical propagation...
Spectral Embedding is one of the most effective dimension reduction algorithms in data mining. However, its computation complexity has to be mitigated in order to apply it for real-world large scale data analysis. Many researches have been focusing on developing approximate spectral embeddings which are more efficient, but meanwhile far less effective. This paper proposes Diverse Power Iteration Embeddings...
Many kinds of huge amount of tweets about real-world events are generated everyday in Twitter. However, the disorganization messages required to be classified by topics and events are one of challenges to get knowledge effectively. To solve the problem, we propose a novel method that combines the cluster algorithm with label propagation algorithm to detect topics in twitter. First, we use canopy cluster...
In this paper we propose a framework of customer baseline load (CBL) estimation for demand response in Smart Grid. The introduction of demand response requires quantifying the amount of demand reduction. This process is called the measurement and verification. The proposed framework of CBL estimation is based on the unsupervised learning technique of data mining. Specifically we leverage both the...
As an important branch of biomedical information extraction, Protein-Protein Interaction extraction (PPIe) from biomedical literatures has been widely researched, and machine learning methods have achieved great success for this task. However, the word feature generally adopted in the existing methods suffers badly from vocabulary gap and data sparseness, weakening the classification performance....
Soybean is one of the most important crops for food, feed and bio-energy world-wide. The study of soybean phenotypic variation at different geographical locations can help the understanding of soybean domestication, population structure of soybean, and the conservation of soybean biodiversity. We investigate if soybean varieties can be identified that they differ from other varieties on multiple traits...
Typical load profile (TLP) describes the hourly values of electricity consumption on a daily basis, and is associated to a certain consumer category, for certain specific operating conditions. TLPs can be defined for residential, small industrial, commercial or services consumers, for warm season and cold season, for week days and weekends. In this paper, the daily load curves of a residential feeder...
Clustering is "the method of organizing objects into groups whose members are related in some way". A cluster is therefore a collection of objects which are coherent internally, but clearly dissimilar to the objects belonging to other clusters. Document clustering is used in many fields such as data mining and information retrieval. Thus, the main goals of this paper are to identify the...
Text clustering is important in many application of information retrieval. This paper presents a study of clustering short texts in Bahasa Indonesia using semantic similarity approach where dictionary of synonyms and hyponyms is used to get information on word relatedness. We compare sentence similarity calculations based on lexical matching and word similarity. More than 250 sentences are involved...
This paper proposes a new approach for clustering English text documents, based on finding the pair wise correlation of documents in a given set of text documents. The correlation coefficient for each pair of documents is calculated on the basis of ranks given to the words in the documents. The ranking of the words occurring in a document is computed on the basis of weights of the words calculated...
Wireless sensor network is becoming an interesting area for the significant research due to its ability to monitor the various range of geographical applications all over the world. Cluster routing algorithms are much efficiently work for communication in wireless sensor network. This paper presents the findings and future of cluster-based routing algorithm using ART1 neural networks. ART1 is an unsupervised...
A new approach of speaker clustering is presented and discussed in this paper. The main technique consists in grouping all the homogeneous speech segments obtained at the end of the segmentation process, by using the spatial information provided by the stereophonic speech. The proposed system is suitable for debates or multi-conferences for which the speakers are located at fixed positions. The new...
In this paper, we propose a new cluster validity index (CVI) based on geometrical shape. Classic CVIs are based on a combination of separation and compactness measures and may include a measure of overlap between clusters. The proposed CVI combines measures of compactness and over-lap using n-sphere shape. We conducted experiments on several real data sets from the UCI repository and compared the...
System mainly studies mass events found images from the Internet, this paper focuses on the data label document Flickr to quantify. This paper also implements single-pass clustering algorithm using traditional text clustering. In this paper, achieve three strategies of single-pass clustering algorithm, and analyze and compare the three strategies. Different document order in the event of the discovery...
Tridiagonal system solver is an important kernel in many scientific and engineering applications. Even though quite a few parallel algorithms and implementations have been addressed in recent years, challenges still remain when solving large-scale tridiagonal system on heterogenous supercomputers. In this paper, a hierarchical algorithm framework SPIKE2 (pronounced 'SPIKE squared') is proposed to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.