The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The Atmospheric Radiation Measurement (ARM) Climate Research Facility (www.arm.gov) provides atmospheric observations from diverse climatic regimes around the world. Currently, ARM archives over 22 million user assessable data files, primarily stored in NetCDF file format, with total data volumes close to one Petabyte. In this paper, we will discuss how ARM is currently storing, distributing, cataloging...
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...
In this paper we propose a two-stage algorithm for robust K-subspaces recovery. In the first stage, a large number of local candidate subspaces are generated by probabilistic farthest insertion, and then the initial near-optimal K-subspaces are solved by combinatorial selection with randomized greedy method. In the second stage, the K-subspaces are further refined by assigning each data vector to...
Social media captures voice of customers at a rapid pace. Consumer perception of a brand is crucial to its success. Current techniques for measuring brand perception using lengthy surveys of handpicked users in person, by mail, phone or online are time consuming and increasingly inadequate. A more effective technique to measure brand perception is to interpret customer voice directly from social media...
RDF datasets have increased rapidly over the last few years. In order to process SPARQL queries on these large datasets, much effort has been spent on developing horizontally scalable techniques, which involve data partitioning and parallel query processing. While distribution may provide storage scalability, it may also incur high communication costs for processing queries. In this paper, we present...
The citizen considers that data source collecting by the government can be released for more diversity usage. However, to archive the open data dream, sensitive data potentially could be published after the proper privacy preserving processing. In this paper, we present a scalable privacy preserving system for open/big data which leverages K-anonymity algorithm and Hadoop framework. We use an experiment...
Globalization and cloud computing have allowed major strides forward in terms of communication possibilities, but it is also illuminating how many different resource options and formats exist access to which would dramatically increase the accuracy and reliability of choices made as a result of computational output. As a result, there is increasing need for methods resolving levels of data translations...
Big Data domain is one of the most promising ICT sectors with substantial expectations both on the side of market growing and design shift in the area of data storage managment and analytics. However, today, the level of complexity achieved and the lack of standardisation of Big Data management architectures represent a huge barrier towards the adoption and execution of analytics especially for those...
Some of the most valuable business benefits that accompany the cloud adoption cannot be exploited without addressing, first, new data security challenges posed by cloud computing distributed nature. A promising approach for alleviating these risks is to provide a security-by-design framework that will assist cloud application developers in defining appropriate context-driven policies that enhance...
Driven by the trends of BigData and Cloud computing, there is a growing demand for processing and analyzing data that are generated and stored across geo-distributed data centers. However, due to the limited network bandwidth between data centers and the growing data volume spread across different locations, it has become increasingly inefficient to aggregate data and to perform computations at a...
Headquarters Air Force Studies, Analyses, and Assessments (AF/A9) supports Force Structure decisions by integrating analysis at various levels of resolution. The Combat Forces Assessment Model (CFAM), is a mixed integer program incorporating results from higher-resolution models to identify an optimal force mix within Air Force resources. CFAM is a deterministic model, but some input models are stochastic,...
Many research projects in bioinformatics may be viewed as scientific workflows. Biologists often run multiple times the same workflow with different parameters in order to refine their data analysis. These executions generate a large volume of files with different formats, which need to be stored for future evaluations. New database models, like NoSQL systems, could be considered to deal with large...
This paper analyzes the possibility of applying model that is not explicitly incorporate competition in hotel Revenue Management. Three scenarios are evaluated; (i) each seller understands how its own price affects its own demand but does not directly account for how its competitor's price does, (ii) each seller knows the total market size, and tries to learn how its own price affects demand, while...
Real-time and temporal information services are intrinsic characteristics in vehicular networks, where the timeliness of data dissemination and the maintenance of data quality interplay with each other and influence overall system performance. In this work, we present the system architecture where multiple road side units (RSUs) are cooperated to provide information services, and the vehicles can...
Data mining algorithms tacitly quite access to the data either at centralized or distributed form. Distributed data becomes a big challenge and cannot handle by a classical analytic tool. Cloud Computing can solve the issues of processing, storing, and analyzing the data at distributing locations within the cloud. However, a significant problem that is preventing free sharing of data is privacy and...
With an increase in the usage of data centers to power content distribution networks (CDN), minimizing the cost of deployment while handling fault-tolerance has become an important research issue. In this work, we demonstrate the importance of cost-aware capacity provisioning in fault-tolerant CDN data centers (that can tolerate failure at a single site). We propose an optimization model that exploits...
Partitioned Global Address Space (PGAS) parallel programming models can provide an efficient mechanism for managing shared data stored across multiple nodes in a distributed memory system. However, these models are traditionally directly addressed and, for applications with loosely-structured or sparse data, determining the location of a given data element within a PGAS can incur significant overheads...
The concept of workflow is used for modeling many of the data-intensive scientific applications executed on data grids. A Workflow is a series of interdependent tasks during which data is processed by different tasks. Scheduling the workflows in the grids is the process of assigning tasks to appropriate resources with the aim of achieving goals such as reducing workflow completion time while considering...
The Message Passing Interface (MPI) is the de facto standard for programming large scale parallelism, with up to millions of individual processes. Its dominant paradigm of Single Program Multiple Data (SPMD) programming is different from threaded and multicore parallelism, to an extent that students have a hard time switching models. In contrast to threaded programming, which allows for a view of...
Coupled application workflows composed of applications implemented using task-based models present new coupling and data exchange challenges, due to the asynchronous interaction and coupling behaviors between tasks of the component applications. In this paper, we present an adaptive data placement approach that addresses these challenges by dynamically adjusting to the asynchronous coupling patterns...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.