The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As the ability to store and process massive amounts of user behavioral data increases, new approaches continue to arise for leveraging the wisdom of the crowds to gain insights that were previously very challenging to discover by text mining alone. For example, through collaborative filtering, we can learn previously hidden relationships between items based upon users' interactions with them, and...
A mechanism for identifying bandings in large "zero-one" N-dimensional data sets, using a sampling technique, is presented. The challenge of identifying bandings in data is the large number of potential permutations that need to be considered. To circumvent this a banding score mechanism is proposed that avoids the need to consider large numbers of permutations. This has been incorporated...
The extreme volume and staggeringly increasing rate inevitably produce unprecedented pressure on any large scale video sharing and hosting systems. Among the efforts to mitigate this pressure, content-based video similarity search is becoming more and more important with the exponential growth of the data size. Though various approaches have been proposed to address this problem, they are mainly focusing...
We are interested in forecasting a large collection of FMCG demand time series. As the demand of FMCG exists in a hierarchy (from manufacturers to distributors to retailers), the bottom level of the hierarchy may contain thousands or even millions of time series. Producing aggregate consistent forecasts while utilizing the unique features from each time series thus become a technical challenge. To...
The U.S.-China relationship is arguably the most important bilateral relationship in the 21st century. Typically it is measured through opinion polls, for example, by Gallup and Pew Institute. In this paper, we propose a new method to measure U.S.-China relations using data from Twitter, one of the most popular social networks. Compared with traditional opinion polls, our method has two distinctive...
Prediction of stock market has attracted attention from industry to academia [1, 2]. Various machine learning algorithms such as neural networks, genetic algorithms, support vector machine, and others are used to predict stock prices.
Considering the wide usage of databases and their ever growing size, it is crucial to improve the query processing performance. Selection of an appropriate set of indexes for the workload processed by the database system is an important part of physical design and performance tuning. This selection is a non-trivial tasks, especially considering possible number of native indexes in modern databases...
Intervals have become prominent in data management as they are the main data structure to represent a number of key data types such as temporal or genomic data. Yet, there exists no solution to compactly store and efficiently query big interval data. In this paper we introduce CINTIA — the Checkpoint INTerval Index Array — an efficient data structure to store and query interval data, which achieves...
For fast disaster estimation after a large-scale disaster occurs, this paper presents a fast spatio-temporal similarity search method that searches a database storing many scenarios of disaster simulation results represented by time-series grid data for some scenarios similar to insufficient observed data sent from sensors. The proposed method efficiently processes spatio-temporal intersection by...
Large and dynamic graphs with streaming updates have been gaining traction recently, along with the need for enabling graph analytics in a commodity cluster instead of a high-performance computing facility. Surprisingly, there is a lack of study on scaling out graph data structures to represent sparse dynamic graphs in a commodity cluster, and even the latest work [1] based upon the most common in-memory...
In the era of data-intensive scientific discovery, data analysis is critical for scientists to identify essential information from the mountains of data generated by large-scale simulations or experiments. A generic operation in scientific data analysis is to combine information from multiple data sets, which are stored in heterogeneous ile formats. This operation is typically known as a Join in database...
The Single Instruction Multiple Data (SIMD) architecture of Graphic Processing Units (GPUs) makes them perfect for parallel processing of big data. In this paper, we present the design, implementation and evaluation of G-Storm, a GPU-enabled parallel system based on Storm, which harnesses the massively parallel computing power of GPUs for high-throughput online stream data processing. G-Storm has...
The recent advancements in high-throughput genome sequencing technologies have accelerated the efficient discovery of novel genomes. De novo assembly is the first and one of the most computationally intensive step to analyze such novel genomes. In this work, we addressed the problem of parallelizing the de Bruijn graph based de novo genome sequence assembly on distributed memory systems. We proposed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.