The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Document similarity analysis is increasingly critical since roughly 80% of big data is unstructured. Accordingly, semantic couplings (relatedness) have been recognized valuable for capturing the relationships between terms (words or phrases). Existing work focuses more on explicit relatedness, with respective models built. In this paper, we propose a comprehensive semantic similarity measure: Semantic...
Many information tasks involve objects that are explicitly or implicitly connected in a network (or graph), such as webpages connected by hyperlinks or people linked by “friendships” in a social network. Research on link-based classification (LBC) has shown how to leverage these connections to improve classification accuracy. Unfortunately, acquiring a sufficient number of labeled examples to enable...
Extreme classification task where the number of classes is very large has received important focus over the last decade. Usual efficient multi-class classification approaches have not been designed to deal with such large number of classes. A particular issue in the context of large scale problems concerns the computational classification complexity : best multi-class approaches have generally a linear...
Relational models for heterogeneous network data are becoming increasingly important for many real-world applications. However, existing relational learning approaches are not parallel, have scalability issues, and thus unable to handle large heterogeneous network data. In this paper, we propose Parallel Collective Matrix Factorization (PCMF) that serves as a fast and flexible framework for joint...
This research presents a novel algorithm for detecting human emotion via speech recognition by using speech spectrogram. The proposed algorithm aims to detect the emotional by using information inside the spectrogram. Neural network was used for being the classifier. A new approach to feature extraction based on analysis of two dimensions time-frequency representation of a speech signal have been...
Time series shapelets proposes an approach to extract subsequences most suitable to discriminate time series belonging to distinct classes. Computational complexity is the major issue with shapelets: the time required to identify interesting subsequences can be intractable for large cases. In fact, it is required to evaluate all the subsequences of all the time series of the training dataset. In the...
The advent of the Big Data challenge has stimulated research on methods and techniques to deal with the problem of managing data abundance. As a result, effective sense-making of semantically rich and big datasets has received a lot of attention, and new search approaches, such as Exploratory Computing (EC), have seen the light. In this paper we present IQ4EC, a system for data exploration inspired...
Many real life prediction problems involve predicting a structured output. Multi-target regression is an instance of structured output prediction whose task is to predict for multiple target variables. Structured output algorithms are usually computationally and memory demanding, hence are not suited for dealing with massive amounts of data. Most of these algorithms can be categorized as local or...
With the growth of information organized in hierarchical databases, it is essential to develop automated approaches for classifying data instances (e.g., documents, proteins and images) into hierarchies. Several classification approaches have been developed that exploit the hierarchical structure prevalent within these underlying databases. One commonly used approach is to train local one-versus-rest...
In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically. To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along...
An arbitrary m×n Boolean matrix M can be decomposed exactly as M = U○V, where U (resp. V) is an m×k (resp. k ×n) Boolean matrix and ○ denotes the Boolean matrix multiplication operator. We first prove an exact formula for the Boolean matrix J such that M = M○JT holds, where J is maximal in the sense that if any 0 element in J is changed to a 1 then this equality no longer holds. Since minimizing k...
We present a generic approach to real-time monitoring of the Twitter sentiment and show its application to the Bulgarian parliamentary elections in May 2013. Our approach is based on building high quality sentiment classification models from manually annotated tweets. In particular, we have developed a user-friendly annotation platform, a feature selection procedure based on maximizing prediction...
The selection of classifiers which are profitable is becoming more and more important in real-life situations such as customer churn management campaigns in the telecommunication sector. In previous works, the expected maximum profit (EMP) metric has been proposed, which explicitly takes the cost of offer and the customer lifetime value (CLV) of retained customers into account. It thus permits the...
Patients in rural India express their discomfort using keyword as query due to their lack of knowledge about the intended domain. Therefore, there is no scope of automatic revision of the query using feedback mechanism, unlike the existing query expansion methods. The paper aims at developing a primary level disease diagnosis system for the patients of rural India by expanding the query using 5-gram...
As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding...
Local pattern mining methods are fragmented along two dimensions: the pattern syntax, and the data types on which they are applicable. Pattern syntaxes considered in the literature include subgroups, n-sets, itemsets, and many more; common data types include binary, categorical, and real-valued. Recent research on pattern mining in relational databases has shown how the aforementioned pattern syntaxes...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.