The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on...
Accurate network and phase connectivity models are crucial to distribution system analytics, operations and planning. Although network connectivity information is mostly reliable, phase connectivity data is typically missing or erroneous. In this paper, an innovative phase identification algorithm is developed by clustering of voltage time series gathered from smart meters. The feature-based clustering...
In this paper, an image segmentation method is presented to analyze the clusters of Computed Tomography (CT) image. Target image is divided to small parts called as observation screens. Principal Component Analysis (PCA) is used for better representation of features about observation screens. The optimal number of component related with observation screen is determined by Horn's Parallel Analysis...
In this paper, we propose a novel method to extract keyframes from motion capture data for people to better visualize and understand the content of the motion. It first applies a Butterworth filter to remove the noise in the motion capture data, then carries out principal component analysis (PCA) to reduce the dimension. By detecting the zero-crossing points of the velocity in the principal components,...
Whether the most important features can be extracted to reduce the dimension of the features or not is crucial to improving the efficiency and performance of the Intrusion Detection System (IDS). In this paper, an intrusion detection feature extraction method based on the complex network theory and the MST algorithm is proposed. The method takes the features of the network connections as nodes of...
Information security has become a very important topic especially during the last years. Web services are becoming more complex and dynamic. This offers new possibilities for attackers to exploit vulnerabilities by inputting malicious queries or code. However, these attack attempts are often recorded in server logs. Analyzing these logs could be a way to detect intrusions either periodically or in...
Sequential fuzzy co-cluster extraction has been proven to be useful for collaborative filtering tasks by extracting user-item co-clusters, in which promising items are connected to the corresponding users in each co-cluster. Since some popular items can be shared by multiple clusters in collaborative filtering problems, exclusive conditions, which force objects to belong to only one cluster, were...
A new PCA adaptive algorithm is introduced, utilizing a rough fuzzy cluster-based granulation scheme for fault detection and diagnosis purposes. This granulated cluster-based algorithm can be used for segmentation of multivariate time series and initialization of other partitioning clustering methods that need to have good initialization parameters. The proposed algorithm is suitable for mining data...
We present a new network anomaly detection system using dissimilarity-based one-class support vector machine( DSVMC). we transform the raw data into a dissimilarity space using Dissimilarity Representations (DR). DR describe objects by their dissimilarities to a set of target class. DSVMC are constructed on these DR. We propose a framework of anomaly detection using DSVMC. A new strategy of prototype...
Multivariate time series (MTS) data sets are common in many multimedia, medical, process industry and financial applications such as gesture recognition, video sequence matching, EEG/ECG data analysis or prediction of abnormal situation or trend of stock price. Multivariate time series clustering is an important task in time series data mining. The unique structure of time series makes many traditional...
The popularity of the Internet has caused a massive increase in the amount of Web pages. The information explosion has led to a growing challenge for information retrieval systems. Document clustering becomes an important process for helping the information retrieval systems organize this vast amount of data. It is believed that grouping similar documents together into clusters will help the users...
We report an automatic feature discovery method that achieves results comparable to a manually chosen, larger feature set on a document image content extraction problem: the location and segmentation of regions containing handwriting and machine-printed text in documents images. This approach is a greedy forward selection algorithm that iteratively constructs one linear feature at a time. The algorithm...
In this paper, we propose an efficient speaker clustering approach based on a locality preserving linear projective mapping in the Gaussian mixture model (GMM) mean supervector space. While the GMM mean supervector has turned out to be an effective representation of speakers, its dimensionality is usually very high. The locality preserving projection (LPP) maps the high-dimensional GMM mean supervector...
In this paper, we try to bring the concept of Ensemble into Outlier Detection. Two Outlier mining algorithms are ensembled: one based on similar coefficient sum and the other based on kernel density. An anomaly detection approach based on voting mechanism is proposed and applied into intrusion detection. We convert the character feature into numerical value by code mapping and use principal Components...
Several general-purpose algorithms and techniques have been developed for image segmentation. Since there is no general solution to the image segmentation problem, these techniques often have to be combined with domain knowledge in order to effectively solve an image segmentation problem for a problem domain. This paper presents a comparative study of the basic image segmentation techniques i.e. edge-based,...
The current practice of recognition spectra manually is no longer applicable to a large extent. This work is particularly focused on helping astronomers finding their interesting celestial objects. In this paper an efficient hierarchical clustering data mining method based on principal component analysis (PCA) is proposed. Massive stellar spectral data are clustered by improved hierarchical clustering...
In content-based image retrieval (CBIR), similarity measures vary according to the user, and it is difficult to build a retrieval system which reflects the user's similarity measures automatically. Regarding CBIR as consisting of feature extraction, coarse classification and detailed matching stages, this work aims at reflecting the user's similarity measures in coarse classification. After obtaining...
Principal components analysis (PCA) is an important approach to unsupervised dimensionality reduction. However, principal components (PCs) are a set of new variables carrying no clear physical meanings and still require all the original variables. To deal with this problem, the PC dominant feature (PCDF) is defined. Then, feature selection using them is considered and a new algorithm for determining...
We have investigated a technique for recognising faces invariant of facial expressions. We apply multi-linear tensor algebra, which subsumes linear algebra, to analyse and recognise 3D face surfaces. This potent framework possesses a remarkable ability to deal with the shortcomings of principle component analysis in less constrained situations. A set of vector spaces can be used to represent the variation...
Herein we describe a system for gastroscopic image retrieval which based on PCA algorithm. First quantize and cluster in HSV color space, make use of spatial partitioning-weighted method to calculate each block’s main color, and then combine with color correlogram to carry out integrate retrieval. A PCA algorithm is adopted to reduce the dimension since the dimension of feature vector which including...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.