The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The advance of high throughput biotechnology enables the generation of large amount of biomedical data. The microarray is increasingly a popular approach for the detection of genome-wide gene expression. Microarray data have thus increased significantly in public accessible database repositories, which provide valuable big data for scientific research. To deal with the challenge of microarray big...
Representational competence (RC), defined as "the ability to simultaneously process and integrate multiple external representations (MERs) in a domain", is a marker of expertise in science and engineering. However, the cognitive mechanisms underlying this ability and how this ability develops in learners, is poorly understood. In this paper, we report a fully controllable interface, designed...
Nearest neighbour search is a core process in many data mining algorithms. Finding reliable closest matches of a query in a high dimensional space is still a challenging task. This is because the effectiveness of many dissimilarity measures, that are based on a geometric model, such as lp-norm, decreases as the number of dimensions increases. In this paper, we examine how the data distribution can...
We revisit the problem of predicting directional movements of stock prices based on news articles: here our algorithm uses daily articles from The Wall Street Journal to predict the closing stock prices on the same day. We propose a unified latent space model to characterize the "co-movements" between stock prices and news articles. Unlike many existing approaches, our new model is able...
Ranking objects is an essential problem in recommendation systems. Since comparing two objects is the simplest type of queries in order to measure the relevance of objects, the problem of aggregating pair wise comparisons to obtain a global ranking has been widely studied. In order to learn a ranking model, a training set of queries as well as their correct labels are supplied and a machine learning...
The current trend of growth of information reveals that it is inevitable that large-scale learning problems become the norm. In this paper, we propose and analyze a novel Low-density Cut based tree Decomposition method for large-scale SVM problems, called LCD-SVM. The basic idea here is divide and conquer: use a decision tree to decompose the data space and train SVMs on the decomposed regions. Specifically,...
Many real-world networks are featured with dynamic changes, such as new nodes and edges, and modification of the node content. Because changes are continuously introduced to the network in a streaming fashion, we refer to such dynamic networks as streaming networks. In this paper, we propose a new classification method for streaming networks, namely streaming network node classification (SNOC). For...
User reported experiences and opinions are used by peers to make decisions about where to go and what to buy. Unfortunately, not all users or opinions are honest. Many opinions are fabricated and may be submitted by automated systems or by people who are recruited by businesses and search engine optimizers to write good reviews. Such reviews and ratings are called spam reviews. These are misleading...
This paper presents iNNE (isolation using Nearest Neighbour Ensemble), an efficient nearest neighbour-based anomaly detection method by isolation. Inne runs significantly faster than existing nearest neighbour-based methods such as Local Outlier Factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and...
Labeled data, in real world, is quite scarce compared with unlabeled data. Manual annotation is usually expensive and inefficient. Active learning paradigm is used to handle this problem by identifying the most informative instances to annotate. In this paper, we proposed a new active learning algorithm based on nonparallel support vector machine. Numeric experiment shows the effective performance...
Since link prediction helps improve our understandings about the structure, functions, and evolution of networks, it has drawn much attention from both computer science and physical communities. Among many mainstream proposed algorithms, the common-neighbor based ones show prominent efficiency but neglect the influence of community structure. Based on the assumption that in the same communities common...
A land cover map that represents the land surface of the earth is based primarily on analysis of remotely sensed images. However, the rate of concordance of existing land cover maps is not high. This lack of concordance results from a difference in classification methods and observation conditions of remotely sensed images. Also, conducting field surveys around the world is unrealistic. Therefore,...
We explore the feasibility of measuring learner engagement and classifying the engagement level based on machine learning applied on data from 2D/3D camera sensors and eye trackers in a 1:1 learning setting. Our results are based on nine pilot sessions held in a local high school where we recorded features related to student engagement while consuming educational content. We label the collected data...
Recent surveys show that there is enormous increase of organizations intending to adopt cloud, but one of their major obstructions is the trustworthiness evaluation of cloud service candidates. Performing evaluations of cloud service candidates is expensive and time consuming, especially with the breadth of services available today. In this situation, this paper proposes a novel trustworthiness measurement...
A novel eye detection method based on template matching is proposed for glasses-free 3D device. Before matching, get the average eye template through a great quantity of eye images, splice several average templates into a chessboard template. Then locate the position of eyes by calculating the correlation coefficient between template and the candidate image. It has been testified that these algorithm...
In wireless localization problems, prior to the implementation of the sensor networks, it is important and valuable to know that, given the localization accuracy constraints, i.e. To ensure the localization error lower than e m at the confidence level of 1-c, then (1) how many location-known sensors (anchors) needed at least and at most? (2) how to select out optimal locations for these anchors from...
The paper presents Echo State Network (ESN) as classifier to diagnose the abnormalities in mammogram images. Abnormalities in mammograms can be of different types. An efficient system which can handle these abnormalities and draw correct diagnosis is vital. We experimented with wavelet and Local Energy based Shape Histogram (LESH) features combined with Echo State Network classifier. The suggested...
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data...
In this preliminary research we examine the suitability of hierarchical strategies of multi-class support vector machines for classification of induced pluripotent stem cell (iPSC) colony images. The iPSC technology gives incredible possibilities for safe and patient specific drug therapy without any ethical problems. However, growing of iPSCs is a sensitive process and abnormalities may occur during...
In order to utilize identification to the best extent, we need robust and fast algorithms and systems to process the data. Having palmprint as a reliable and unique characteristic of every person, we extract and use its features based on its geometry, lines and angles. There are countless ways to define measures for the recognition task. To analyze a new point of view, we extracted textural features...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.