The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid advances in digital technology, the multimedia documents have been growing ubiquitously. The analysis of this huge repository of multimedia documents requires efficient organization of documents. Multimedia document clustering organizes the multimedia documents with common multimedia topics. The important step of multimedia document clustering is computing the similarity between multimedia...
Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Since acquiring labeled data is particularly expensive in both time and effort, unsupervised feature selection on unlabeled data has recently gained considerable attention. Without label information, unsupervised feature selection needs alternative criteria...
In crowdsourced software development, routing a task to right developers is a critical issue that largely affects the productivity and quality of software. In particular, crowdsourced software development platforms (e.g. Topcoder and Kaggle) usually adopt the competition-based crowdsourcing model. Given an incoming task, most of existing efforts focus on using the historical data to learn the probability...
Many real-world problems involve multi-view high-dimension-small-sample-size data analysis, such as multi-omics data. The combination of multi-view databases is supposed to provide a better biological significance. However, the multi-view data always contain noise and outlying entries that result in inaccurate and unreliable. It has become an urgent need how to effectively analyze these data. We proposed...
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on...
Detection of interesting (e.g., coherent or anomalous) clusters has been studied extensively on plain or univariate networks, with various applications. Recently, algorithms have been extended to networks with multiple attributes for each node in the real-world. In a multi-attributed network, often, a cluster of nodes is only interesting for a subset (subspace) of attributes, andthis type of clusters...
There has been a surge in research interest in learning feature representation of networks in recent times. Researchers, motivated by the recent successes of embeddings in natural language processing and advances in deep learning, have explored various means for network embedding. Network embedding is useful as it can exploit off-the-shelf machine learning algorithms for network mining tasks like...
Structural variations are a complex collection of mutations, many of which are reported to associated to complex traits. Recent research reports a rare case of structural variants, complex indels, which may contribute to carcinogenesis. A complex indel often presents multiple inserted nucleotides in a deleted region. Due to the limitations on both data and algorithm, existing approaches could only...
Medicinal plants are getting increasingly popular across the world for their ability to cure different diseases including chronic ones. The chemical compositions present in those plant leaves are main contributors for the healing characteristics. The potential of using such plants also depends on the maturity of the medicinal plant under use. The leaves with appropriate maturity can cause better healing...
Partial discharge (PD) in power transformer degrades the dielectric insulation and results in insulation failure and breakdown after a long period. In fact, a partial discharge detector can gather signals from two or more PD sources which increase the difficulty on pattern recognition and insulation state assessment. How to separate multiple PD signals is meaningful to subsequent data processing....
The world is witnessing a remarkable increase in the usage of video surveillance systems. Besides fulfilling an imperative security and safety purpose, it also contributes towards operations monitoring, hazard detection and facility management in industry/smart factory settings. Most existing surveillance techniques use hand-crafted features analyzed using standard machine learning pipelines for action...
In the training of the radial basis function network (RBFN), feature selection and classifier design are two tasks commonly addressed in separated processes. The former is related to the number of input nodes, whereas the latter is associated with the design of the hidden layer. Hence, this paper presents an algorithm to train a RBFN based on differential evolution (DE), which simultaneously adjusts...
Web spam is a big problem for search engine users in World Wide Web. They use deceptive techniques to achieve high rankings. Although many researchers have presented the different approach for classification and web spam detection still it is an open issue in computer science. Analyzing and evaluating these websites can be an effective step for discovering and categorizing the features of these websites...
Approximately 50,000 to 60,000 new cases of Parkinson's disease (PD) are diagnosed yearly. Despite being non-lethal, PD shortens life expectancy of the ones affected with such disease. As such, researchers from different fields of study have put great effort in order to develop methods aiming the identification of PD in its early stages. This work uses handwriting dynamics data acquired by a series...
The unified Parkinson's disease rating scale (UPDRS) is the most widely employed scale for tracking Parkinson's disease (PD) symptom progression. However, conventional way to achieve UPDRS, mainly based on the physical examinations of clinic patients performed by the trained medical staffs, involves the disadvantages of inconvenience and high medical expense. Hence, in this study, we try to explore...
The Big data analytics gives new chances to the enterprises to enhance their management and manufacturing levels. A solution with case study is proposed to accomplish deep-level quality management based on big data analytics. First, the implementation of big data analytics based on industrial process data is illustrated with case study illustration. Through the analysis and feature extraction of off-line...
We consider the problem of finding consistent matches across multiple images. Current state-of-the-art solutions use constraints on cycles of matches together with convex optimization, leading to computationally intensive iterative algorithms. In this paper, we instead propose a clustering-based formulation: we first rigorously show its equivalence with traditional approaches, and then propose QuickMatch,...
Research efforts have been devoted to extraction and visualization of vortices in an unsteady (turbulent) flow. Characterizing the behaviors of the flow, vortices are identifiable as regions using a vortex detector known as the lambda2-criterion. Isosurface visualization renders vortex regions based on a chosen isovalue. However, it is highly challenging to choose one isovalue suitable for visualizing...
The random Fourier Features method has been found very effective in approximating the kernel functions. Our former studies show that through a mixing mechanism of the feature space formed by random Fourier features and certain linear algorithms, the fuzzy clustering results in the approximated feature space are comparable to or even exceed the classical kernel-based algorithms. To increase the robustness...
For text clustering task, distinctive text features selection is important due to feature space high dimensionality. It is essential to reduce the feature space dimension to increase accuracy and decrease processing time. In this work, for text clustering task, we introduce a novel hybrid feature selection model. This method measures the term importance based on the correlation coefficient among four...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.