The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Protein function prediction is an active research area in bioinformatics. Protein functions are highly related to their structures. Therefore, effective structure based protein representations are required. Pires et al. [BMC Genomics, 12, S12 (2011)] proposed a cutoff scanning matrix (CSM) method for protein representation that utilizes distance patterns between protein residues and a maximum cutoff...
Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can...
We revisit the problem of predicting directional movements of stock prices based on news articles: here our algorithm uses daily articles from The Wall Street Journal to predict the closing stock prices on the same day. We propose a unified latent space model to characterize the "co-movements" between stock prices and news articles. Unlike many existing approaches, our new model is able...
Ranking objects is an essential problem in recommendation systems. Since comparing two objects is the simplest type of queries in order to measure the relevance of objects, the problem of aggregating pair wise comparisons to obtain a global ranking has been widely studied. In order to learn a ranking model, a training set of queries as well as their correct labels are supplied and a machine learning...
Many real-world networks are featured with dynamic changes, such as new nodes and edges, and modification of the node content. Because changes are continuously introduced to the network in a streaming fashion, we refer to such dynamic networks as streaming networks. In this paper, we propose a new classification method for streaming networks, namely streaming network node classification (SNOC). For...
User reported experiences and opinions are used by peers to make decisions about where to go and what to buy. Unfortunately, not all users or opinions are honest. Many opinions are fabricated and may be submitted by automated systems or by people who are recruited by businesses and search engine optimizers to write good reviews. Such reviews and ratings are called spam reviews. These are misleading...
Recent surveys show that there is enormous increase of organizations intending to adopt cloud, but one of their major obstructions is the trustworthiness evaluation of cloud service candidates. Performing evaluations of cloud service candidates is expensive and time consuming, especially with the breadth of services available today. In this situation, this paper proposes a novel trustworthiness measurement...
Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require...
Fluid mechanics considers two frames of reference for an observer watching a flow field: Eulerian and Lagrangian. The former is the frame of reference traditionally used for flow analysis, and involves extracting particle trajectories based on a vector field. With this work, we explore the opportunities that arise when considering these trajectories from the Lagrangian frame of reference. Specifically,...
Automatic extraction of hyponymy relations between concepts in an ontology is significant for ontology learning and knowledge organization. In this paper, we propose a fusion approach of hyponymy relation extraction in patent domain, using Relative Decoration Degree (RDEG) to extract high precision relations, and then Association Rule (AR) to enrich those relations. We use Cilin to extend a word to...
Sentence similarity compute is an important part in question answering system based on frequency asking questions. The accuracy of the existing sentence similarity algorithm needs to be improved, so this paper presents a revised question similarity compute method. We combine the word order feature with vector space model algorithm. When we use the VSM to compute the question similarity, we propose...
Selecting accurate and simple association rules that efficiently cover all data samples is very important in knowledge discovery. There are several measures to assess accuracy and relations in a rule. This poses a challenge for researchers to select effective measures. Combining different measures via multi-objective evolutionary algorithms is an effective method to select suitable association rules...
Recommendation systems suggest useful information to the end users. They predict the information demands of online users and offer recommendations to facilitate their navigation. There are many approaches to construct such systems. Most of the recommendation systems use data mining techniques on the web access log or the database of the site and discover user's access patterns. Afterwards, the recommendation...
The imbalance data problem in classification is a significant research area and has attracted a lot attention in recent years. Rebalancing class distribution techniques such as over-sampling or under-sampling are the most common approaches to deal with this problem. This paper presents a new method so called Diversity and Separable Metrics in Over-Sampling Technique (DSMOTE) to handle the imbalanced...
Predicting students' academic achievement with high accuracy has an important vital role in many academic disciplines. Most recent studies indicate the important role of the data type selection. They also attempt to understand individual students more deeply by analyzing questionnaire for a particular purpose. The present study uses free-style comments written by students after each lesson, to predict...
In contrast with UKF, the standard CKF can solve high-dimensional nonlinear filter problems. However, when the nonlinear systematic dimension increases, the accuracy of CKF will decline and the computational cost will increase rapidlly. Two-Stage Kalman filter can solve this problem, but it only applies to linear systems. This paper proposes a two-stage Cubature Kalman filter (TSCKF) which can solve...
The aim of this paper was to separate the EEG recordings into cerebral and noncerebral waves and compare a statistical properties of chosen components using the coefficient of excess kurtosis. Noncerebral waves, particularly the ocular artifacts, should be properly identified, because some of them imitate the cerebral potentials. The eye opening and closure, blinks or eye flutter are similar to the...
Being transmitted as part of numerous Internet services, geo location data is increasingly bringing hints of people's real-world activities into Internet traffic. This paper focuses on the discovery of key properties that motivate personal activities - locational interests. We propose and design GeoEcho, a mobile traffic analysis system that extracts and analyses a wealth of latitude-longitude geotag...
In mining massive datasets, often two of the most important and immediate problems are sampling and feature selection. Proper sampling and feature selection contributes to reducing the size of the dataset while obtaining satisfactory results in model building. Theoretically, therefore, it is interesting to investigate whether a given dataset possesses a critical feature dimension, or the minimum number...
Understanding the states of learners at a lecture is useful for improving the quality of the lecture. A video camera with an infrared sensor Kinect has been widely studied and proved to be useful for some kinds of activity recognition. However, learners in a lecture usually do not act with large moving. This paper evaluates Kinect for use of activity recognition of learners. The authors considered...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.