The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Millions of people use email correspondence for communication across the globe and it is a critically vital application for many businesses. Considerable amount of unsolicited mail flows into user's mail boxes on a daily basis. A major negative aspect since the past decade has been bulk spam or phishing mail. Besides such unsolicited spam emails being wearisome for many email users, it also puts pressure...
In order to avoid mixing up languages, infants immersed in a multilingual environment have to sort speech into language-homogeneous sets. To study the feasibility of this task, we use speech technology tools (Universal Background Models and i-vectors) in combination with unsupervised clustering to test language separation using speech from several speakers of two languages. We investigate the outcome...
Nearest Centroid Neighbor (NCN) classifier is a fast and simple algorithm representing supervised methods of data classification. This algorithm assumes that all classes can be represented by the individual clusters and the classes means (centroids) are used to determine to which class a new unknown sample belongs. However the assumption that each class consists of one and exactly one cluster limits...
A discriminative dictionary learning algorithm is proposed to find sparse signal representations using relative attributes as the available semantic information. In contrast, existing (discriminative) dictionary learning (DDL) approaches mostly utilize binary label information to enhance the discriminative property of the signal reconstruction residual, the sparse coding vectors or both. Compared...
We address the problem of how to design a more effective co-training scheme to tackle the multi-view spectral clustering. The conventional co-training procedure treats information from all views equally and often converges to a compromised consensus view that does not fully utilize the multiview information. We instead propose to learn an augmented view and construct its corresponding affinity matrix...
Inventor name disambiguation is the task that distinguishes each unique inventor from all other inventor records in a patent database. This task is essential for processing person name queries in order to get information related to a specific inventor, e.g. a list of all that inventor's patents. Using earlier work on author name disambiguation, we apply it to inventor name disambiguation. A random...
Zero-shot learning (ZSL) aims to classify the objects without any training samples. Direct Attribute Prediction (DAP) gives a solution with attribute space but it makes the assumption of attribute independence. To relax this assumption and consider the relation among attributes, Joint Attribute Chain Prediction (JACP) algorithm is proposed in this paper. It estimates the joint probability of attribute...
Motion information is a key factor for action recognition and has been eagerly pursued for decades. How to effectively learn motion features in Convolutional Networks (ConvNets) remains an open issue. Prevalent ConvNets often take several full frames of video as input at a time, which can be a heavy burden for network training. In this paper, we introduce a novel framework called Tube ConvNets, by...
This paper proposes a multi-class learning (MCL) algorithm for a deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. Although the DNN-based SPSS system improves the modeling accuracy of statistical parameters, its synthesized speech is often muffled because the training process only considers the global characteristics of the entire set of training data, but does...
The disadvantages of BOW (Bag of words model) for image classification include the large amount of data in generating a codebook by clustering, redundant code words that may affect the classification results and so on. The process of BOW for the classification can be improved through the Laplace weights to improved fuzzy C means algorithm, and obtaining codebook with more ability to distinguish between...
Dynamic ranking learning problem is considered when the training sample is a data stream, consisting of a sequence of a series of objects characterized by a set of features and relative ranks within each series. The problem is reduced to preference learning to rank on clusters in the feature space of ranked objects, while aggregated training dataset is formed from the centers of clusters and estimates...
As the lifeblood of the electric power system, the fault of transmission lines directly threaten the safe operation of the power system. Thus, effective and accurate fault prediction and positioning analysis of transmission lines, has important practical value and economic significance to the security of the power system. To solve the asymmetry of transmission line fault problem, the paper proposes...
For trajectory model to study mining, using Vector Fields on Manifold instead of the Euclidean distance to metric similarity between trajectories, multi scale transform method is used to optimize the mapping in the Vector Fields on Manifold trajectory distance calculation and use Som algorithm for training a classification model. This method will be the trajectory shape features to measure the similarity...
Clustering algorithm is often used to analyze the communication data for network intrusion detection system. However, network communication data are mixed, e.g., numerical and categorical data. So, at first, this paper put forward a method for representing the cluster center (prototype) of mixed-type data. Then respectively in combination with the continuity characteristic of the numerical attributes...
With the increasing size of big data, classifiers usually suffer from intractable computing and storage issues. Moreover, decision boundaries in complex classification problems are usually complicated and circuitous. Modeling on too many instances can sometimes cause oversensitivity to noise and degrade the learning accuracies. Instance selection offers an effective way to improve classification performance...
Spike sorting is the problem of identifying and clustering neurons spiking activity from recorded extracellular electro-physiological data. This is important for experimental neuroscience. Existing approaches to solve this problem consist of three steps: spike detection, feature extraction, and clustering. In our method, we use Fisher discriminant based dictionary learning to learn dictionary, whose...
Data clustering is one of the widely used data analysis methods which groups the unlabeled data into similar clusters. Classical data clustering methods under-performs to cluster multi-dimensional dataset such as micro arrays datasets. Therefore, this paper introduces a novel metaheuristic gauss-based cuckoo search clustering method to extend the capabilities of traditional clustering methods. The...
Spectral clustering consists in creating, from the spectral elements of a Gaussian affinity matrix, a low-dimensional space in which data are grouped into clusters. The performance of the spectral clustering is mainly depended upon the construction of the similarity function. The most commonly used similarity function is Gaussian similarity function. However, questions about the separability of clusters...
With the rapid development of Web and the rapid expansion of text information, how to effectively organize and manage these information is a great challenge for the current information science. Text automatic classification technology can effectively organize a large number of texts and help people to improve the efficiency of information retrieval. It has become one of the most important research...
Fault diagnosis is an important procedure to ensure the equipment efficiency and stability. The diagnosis process is actually a pattern recognition process, and usually, the fault samples are lack of tags of fault types. In this case, the non-supervised learning method is more available, and kernel clustering is one of the most effective methods. In this paper, a novel electromagnetic particle swarm...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.