The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There are two popular schools of thought for performing large-scale machine learning that does not fit into memory. One is to run machine learning within a relational database management system, and the other is to push analytical functions into MapReduce. As each approach has its own set of pros and cons, we propose a database-Hadoop hybrid approach to scalable machine learning where batch-learning...
Hyper visors are widely used in cloud environments and their impact on application performance has been a topic of significant research and practical interest. We conducted experimental measurements of several benchmarks using Hadoop MapReduce to evaluate and compare the performance impact of three popular hyper visors: a commercial hyper visor, Xen, and KVM. We found that differences in the workload...
In cloud computing environments, database systems have to serve a large number of tenants instantaneously and handle applications with different load characteristics. To provide a high quality of services, scalable distributed database systems with self-provisioning are required. The number of working nodes is adjusted dynamically based on user demand. Data fragments are reallocated frequently for...
User profiling is the process of collecting information about a user in order to construct their profile. The information in a user profile may include various attributes of a user such as geographical location, academic and professional background, membership in groups, interests, preferences, opinions, etc. Big data techniques enable collecting accurate and rich information for user profiles, in...
Consider two parties who want to compare their strings, e.g., genomes, but do not want to reveal them to each other. We present a system for privacy-preserving matching of strings, which differs from existing systems by providing a deterministic approximation instead of an exact distance. It is efficient (linear complexity), non-interactive and does not involve a third party which makes it particularly...
This paper describes proposed privacy extensions to UML to help software engineers to quickly visualize privacy requirements, and design privacy into big data applications. To adhere to legal requirements and/or best practices, big data applications will need to apply Privacy by Design principles and use privacy services, such as, and not limited to, anonymization, pseudonymization, security, notice...
The volume and complexity of data produced and analyzed in scientific collaborations is growing exponentially. It is important to track scientific data-intensive analysis workflows to provide context and reproducibility as data is transformed in these collaborations. Provenance addresses this need and aids scientists by providing the lineage or history of how data is generated, used and modified....
'Big Data' techniques are often adopted in cross-organization scenarios for integrating multiple data sources to extract statistics or other latent information. Even if these techniques do not require the support of a schema for processing data, a common conceptual model is typically defined to address name resolution. This implies that each local source is tasked of applying a semantic lifting procedure...
Data Analytics has proven its importance in knowledge discovery and decision support in different data and application domains. Big data analytics poses a serious challenge in terms of the necessary hardware and software resources. The cloud technology today offers a promising solution to this challenge by enabling ubiquitous and scalable provisioning of the computing resources. However, there are...
Establishing scalable and cross-enterprise workflow management systems (WfMSs) in the cloud requires the adaptation and extension of existing concepts for process management. This paper proposes a scalable and cross-enterprise WfMS with a multitenancy architecture. Especially, it can activate enactment of workflow processes by cloud collaboration. We do not employ the traditional engine-based WfMSs...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.