The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
What-if analysis focuses on analysis on hypothetical scenarios based on historical data. Therefore, it can provide more meaningful information than classical OLAP (on-line analysis processing) for the users of decision support system. As big data OLAP systems are always based on the computation model of MapReduce, of which the advantage is to handle large data sets in batch-processing mode, however...
One of the major challenges in big data processing is the efficiency of cross join, such as the similarity calculation in business intelligence. In this paper we introduce an optimal data distribution algorithm for distributed cross join which combine each row from the first table with each row from the second table, which can reduce the network traffic and guarantee the computation balance of the...
Deduplication technology has been increasingly used to reduce the storage cost. In practice, it often causes additional on-disk fragments that impair the reading performance. To reduce the impact of fragments, traditional thought of defragmentation that reallocating files on-disk to achieve contiguous layout has been widely used in many operating systems. Unfortunately, file defragmentation is highly...
Data encryption has been widely used. It is important to detect encrypted data. We present a method for detection of encrypted data based on the Support Vector Data Description (SVDD) algorithm. The SVDD is a single class, non-parametric approach for modeling the support of a distribution. We apply the SVDD techniques for detection of encrypted data. Experimental results show that the SVDD can be...
Assessment plays an important role in education. In traditional classrooms, it is difficult to know student's learning status and the instructional feedback being a certain degree of subjectivity and a certain delay. The application of mobile devices in classrooms makes the collection of assessment data possible. Cloud computing has a powerful computing resources and can handle the large amounts data...
As an important resource and productive element, big data permeates all the domains, such as: E-commerce, traffic management or smart city. When possessing the capability of aggregating the information and then mining and analyzing deeply the latent knowledge, it will bring endless innovative achievements. Therefore, big data mining and deep analytics is becoming one of the research hotspots, and...
Load balance and power proportionality are both important aspects in constructing high-performance and cost-effective distributed storage systems. However, traditional replica placement strategies towards load balance usually produce scattered replica layouts which disable power proportionality, while recent strategies towards power proportionality are typically based on uniform replication which...
Book recommendation is an important part and task for personalized services and educations provided by the academic libraries. Many libraries have the readers' borrowing records without the readers' rating information on books. And the collaborative filtering (CF) algorithms are not proper under this circumstance. To apply the CF algorithms in book recommendation, in this paper, we construct the ratings...
The superior I/O performance of solid-state storage (e.g., solid-state drives) makes it become an attractive replacement for the traditional magnetic storage (e.g., harddisk drives). More and more storage systems start to integrate solid-state storage into their architecture. To understand the impacts of solid-state storage on the performance of Hadoop applications, we consider a hybrid Hadoop storage...
Currently semantic annotation is a key technology of data provenance tracing and a research focus. Semantic annotation provides users with rich semantic provenance information to meet tracing needs of different users, which can effectively improve the effectiveness and relevance of data provenance tracing. For meeting the needs of data provenance dependency analysis, a model of data provenance semantic...
Large scale approximate k-nearest neighbors search is an important and very useful technique for many multimedia retrieval applications. Most of existing search algorithms used the centralized indexing approaches and thus cannot meet the needs to search upon large scale datasets. This paper proposes an efficient and distributed approximate k-nearest neighbors search algorithm over a billion high-dimensional...
In daily life, people carry smartphones every where. The sensors included in smartphones can tell us much information. Activity recognition by smartphone can be used for healthcare and sports management. People carry smartphones in different positions, such as the pocket of the trousers, hands or bags. We use accelerometer embedded in the smartphones to classify five activities, such as staying still,...
Cloud computing provide an economically promising paradigm of outsourcing computation. Input/output privacy and verification are becoming the major security concern, and the efficiency of client and cloud side is becoming the main practice concern. Focusing on the existed linear algebra outsourcing schemes used fully homomorphic encryption scheme or somewhat homomorphic encryption scheme are far from...
The MapReduce processing framework is unaware of the property of underlying datasets. For ordered datasets (e.g., time-series data), in which records have been already sorted, MapReduce still performs unnecessary sorting operations during its execution. It directly results in a significant increase of execution time, as sorting a large volume of data is time-consuming. In this paper, we propose a...
With the development of the Internet and Web technology and frequent cross-cultural communication, it is necessary for people to be able to access and manage information in many different languages. The scheduling of multilingual information resources become complex, as multilingual information resources stored in different cloud datacenters are heterogeneous and their distributions are uneven. In...
For the ability to explore the memory's fullest potential, memory de-duplication has been widely experimented with in current main stream virtualization platforms. A lot of work has been done to improve the efficiency of memory de-duplication, while too little attention was paid to the introduced security issues which have been proved by prior work. To deal with this security risk and efficiently...
As the most popular information publishing platform, the Web contains a lot of valued information of interests to users or applications. Although a lot of data extraction techniques have been studied in the last decade, it is still far away from meeting the need of real data extraction. On the one hand, most of them cannot support the whole web information extraction process involving three stages:...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.