Search results

chapter

Distributed database and application architecture for big data solutions

Makoto Misaki, Tomio Tsuda, Shinji Inoue, Shintaro Sato, more

2016 International Symposium on Semiconductor Manufacturing (ISSM) > 1 - 4

2016 International Symposium on Semiconductor Manufacturing (ISSM)

In this article, we report about platform and architecture that real-time analysis of big data are possible, and structured IT infrastructure that they are optimally combined. We developed a distributed architecture which the data conversion and the abnormality determination are multi-blocked. Furthermore, by selecting a distributed storage DB, we succeeded in constructing IT infrastructure capable...

chapter

Comparative analysis of various distributed file systems & performance evaluation using map reduce implementation

Madhavi Vaidya, Shrinivas Deshpande

2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE) > 1 - 6

2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE)

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such...

chapter

Next-gen tools for big scientific data: ARM data center example

Ranjeet Devarakonda, Kyle Dumas, Sheman Beus, Everett Rush, more

2016 IEEE International Conference on Big Data (Big Data) > 3968 - 3970

2016 IEEE International Conference on Big Data (Big Data)

The Atmospheric Radiation Measurement (ARM) Climate Research Facility (www.arm.gov) provides atmospheric observations from diverse climatic regimes around the world. Currently, ARM archives over 22 million user assessable data files, primarily stored in NetCDF file format, with total data volumes close to one Petabyte. In this paper, we will discuss how ARM is currently storing, distributing, cataloging...

chapter

Exploring Controlled RDF Distribution

Raqueline R.M. Penteado, Rebeca Scroeder, Carmem S. Hara

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) > 160 - 167

2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

RDF datasets have increased rapidly over the last few years. In order to process SPARQL queries on these large datasets, much effort has been spent on developing horizontally scalable techniques, which involve data partitioning and parallel query processing. While distribution may provide storage scalability, it may also incur high communication costs for processing queries. In this paper, we present...

chapter

Accessing and distributing large volumes of NetCDF data

Ranjeet Devarakonda, Yaxing Wei, Michele Thornton

2016 IEEE International Conference on Big Data (Big Data) > 3966 - 3967

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we will discuss how NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) is distributing large volumes of ‘structured’ data using Daily Surface Weather Data and a corresponding Climatological Summaries Dataset (Daymet) as an example.

chapter

Energy-Aware Migration of Groups of Virtual Machines in Distributed Data Centers

Rodrigo A. C. da Silva, Nelson L. S. da Fonseca

2016 IEEE Global Communications Conference (GLOBECOM) > 1 - 6

GLOBECOM 2016 - 2016 IEEE Global Communications Conference

This paper proposes the Topology-aware Virtual Machine Selection (TAVMS) algorithm to choose sets of communicating groups of virtual machines (VMs) to be migrated to other data centers, aiming at global energy savings. It considers the migration of groups of VMs as well as the data center network topology, selecting VM groups with network proximity in order to increase the potential number of equipments...

chapter

Identifying performance bottlenecks in Hive: Use of processor counters

Alexander C. Shulyak, Lizy K. John

2016 IEEE International Conference on Big Data (Big Data) > 2109 - 2114

2016 IEEE International Conference on Big Data (Big Data)

Distributed SQL Query Engines, like Hive, Spark, and Impala, have become the de-facto database set-up for Decision Support Systems with large database sizes. Unlike other distributed computing like graph processing and OLTP transactions, DSS queries are often CPU bound as opposed to Network I/O bound [8]. In this paper, we identify apparent anomalies in query performance on a distributed Hive database...

chapter

Comparing application performance on HPC-based Hadoop platforms with local storage and dedicated storage

Zhuozhao Li, Haiying Shen, Jeffrey Denton, Walter Ligon

2016 IEEE International Conference on Big Data (Big Data) > 233 - 242

2016 IEEE International Conference on Big Data (Big Data)

Many high-performance computing (HPC) sites extend their clusters to support Hadoop MapReduce for a variety of applications. However, HPC cluster differs from Hadoop cluster on the configurations of storage resources. In the Hadoop Distributed File System (HDFS), data resides on the compute nodes, while in the HPC cluster, data is stored on separate nodes dedicated to storage. Dedicated storage offloads...

chapter

PPFS: A Scale-Out Distributed File System for Post-Petascale Systems

Fuyumasa Takatsu, Kohei Hiraga, Osamu Tatebe

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 1477 - 1484

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

The convergence of high-performance computing and big data, which has become known as the field of extreme big data, is problematic in that file creation in storage systems such as distributed file systems is not optimized. That is, the large workload leads to the simultaneous creation of many files by many processes when creating checkpoints. The need to improve the file creation processes prompted...

chapter

A clustering approach for anonymizing distributed data streams

Mona A. Mohamed, Magdy H. Nagi, Sahar M. Ghanem

2016 11th International Conference on Computer Engineering & Systems (ICCES) > 9 - 16

2016 11th International Conference on Computer Engineering & Systems (ICCES)

Privacy preserving data mining have been studied widely on static data. Static algorithms are not suitable for streaming data. This imposes the study of new algorithms for privacy preserving that cope with data streams characteristics. Recently, effective anonymization algorithms have been studied on centralized data streams. In this paper we propose an approach for anonymizing distributed data streams...

chapter

Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?

Renan Fischer E Silva, Paul M. Carpenter

2016 IEEE 41st Conference on Local Computer Networks (LCN) > 415 - 423

2016 IEEE 41st Conference on Local Computer Networks (LCN)

With the advent of big data, data center applications are processing vast amounts of unstructured and semi-structured data, in parallel on large clusters, across hundreds to thousands of nodes. The highest performance for these batch big data workloads is achieved using expensive network equipment with large buffers, which accommodate bursts in network traffic and allocate bandwidth fairly even when...

chapter

IMFSSC: An In-Memory Distributed File System Framework for Super Computing

Binyang Li, Bo Li, Ming Liu

2016 7th International Conference on Cloud Computing and Big Data (CCBD) > 132 - 137

2016 7th International Conference on Cloud Computing and Big Data (CCBD)

Supercomputing has been widely implemented in theoretical physics, theoretical chemistry, climate modeling, biology simulation and medicine research for high-performance and energy-efficient computing. Many of scientific applications are I/O sensitive and users have to tolerate high latency when supercomputing center storage processes thousands of I/O requests. In this paper, IMFSSC, an in-memory...

chapter

Towards cost-effective capacity provisioning for fault-tolerant green distributed data centers

Rakesh Tripathi, S. Vignesh, Venkatesh Tamarapalli

2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) > 1 - 6

2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS)

Many critical e-commerce and financial services predominantly depend on geo-distributed data centers for scalability and availability. Recent market surveys show that failure of a data center is inevitable causing huge financial loss. Fault-tolerant distributed data centers are typically designed by provisioning spare capacity to mask failure at a site. At the same time, data center operators are...

chapter

Minimizing the cost of designing fault-tolerant CDN data centers

S. Vignesh, Rakesh Tripathi, Venkatesh Tamarapalli

2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) > 1 - 3

2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS)

With an increase in the usage of data centers to power content distribution networks (CDN), minimizing the cost of deployment while handling fault-tolerance has become an important research issue. In this work, we demonstrate the importance of cost-aware capacity provisioning in fault-tolerant CDN data centers (that can tolerate failure at a single site). We propose an optimization model that exploits...

chapter

Týr: Blob Storage Meets Built-In Transactions

Pierre Matri, Alexandru Costan, Gabriel Antoniu, Jesus Montes, more

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 573 - 584

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

Concurrent Big Data applications often require high-performance storage, as well as ACID (Atomicity, Consistency, Isolation, Durability) transaction support. Although blobs (binary large objects) are an increasingly popular storage model for such applications, state-of-the-art blob storage systems offer no transaction semantics. This demands users to coordinate data access carefully in order to avoid...

chapter

Multiple big data processing platforms

Bao Rong Chang, Hsiu-Fen Tsai, Yi-Sheng Chang, Chien-Feng Huang

2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI) > 207 - 211

2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI)

The integration of Hive, Impala and Spark SQL platforms has achieved to perform rapid data retrieval using SQL query in big data environment. This paper is to design the optimized platform selection for highly improving the response of data retrieval. It can automatically choose the best-perform platform to best perform SQL commands. In addition, the distributed memory storage systems using Memcached...

chapter

Coded distributed computing: Fundamental limits and practical challenges

Songze Li, Qian Yu, Mohammad Ali Maddah-Ali, A. Salman Avestimehr

2016 50th Asilomar Conference on Signals, Systems and Computers > 509 - 513

2016 50th Asilomar Conference on Signals, Systems and Computers

In this paper, we demonstrate a coded computing framework, named Coded Distributed Computing (CDC), which optimally trades extra computation resources for communication bandwidth in a MapReduce-type distributed computing environment. We also empirically illustrate the practical impact of CDC by analyzing the performance of a distributed sorting algorithm, named CodedTeraSort, which was developed by...

chapter

Panda: Public auditing for shared data with efficient user revocation in the cloud

Dnyanada Dongare, Vijayalakshmi Kadroli

2016 Online International Conference on Green Engineering and Technologies (IC-GET) > 1 - 3

2016 Online International Conference on Green Engineering and Technologies (IC-GET)

In today's computing world in the cloud user can easily modify and share data as group. The main issues in the cloud computing was data privacy, data integrity, data access by unauthorized users. TTP (Trusted Third Party) is used to store and share data in cloud computing. To verify integrity of data, users in the group need to compute signature on all the blocks in shared data. In shared data different...

chapter

Data storage in big data context: A survey

A. Elomari, A. Maizate, L. Hassouni

2016 Third International Conference on Systems of Collaboration (SysCo) > 1 - 4

2016 Third International Conference on Systems of Collaboration (SysCo)

As data volumes to be processed in all domains; scientific, professional, social…etc., are increasing at a high speed, their management and storage raises more and more challenges. The emergence of highly scalable infrastructures has contributed to the evolution of storage management technologies. However, numerous problems have emerged such as consistency and availability of data, scalability of...

chapter

Efficient Distributed Skyline over Imperfect Data Modeled by the Evidence Theory

Sayda Elmi, Mohamed Anis Bach Tobji, Allel Hadjali, Boutheina Ben Yaghlane

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) > 335 - 342

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)

Thanks to their ability to return interesting objects in a database, the skyline queries have received considerable attention from the database community over the last few years. Skyline analysis is a powerful tool in a wide spectrum of real applications including multi-criteria optimal decision making, preference answering and many applications where uncertain, imprecise and noisy data inherently...

INFONA - science communication portal

Search results

Distributed database and application architecture for big data solutions

Comparative analysis of various distributed file systems & performance evaluation using map reduce implementation

Next-gen tools for big scientific data: ARM data center example

Exploring Controlled RDF Distribution

Accessing and distributing large volumes of NetCDF data

Energy-Aware Migration of Groups of Virtual Machines in Distributed Data Centers

Identifying performance bottlenecks in Hive: Use of processor counters

Comparing application performance on HPC-based Hadoop platforms with local storage and dedicated storage

PPFS: A Scale-Out Distributed File System for Post-Petascale Systems

A clustering approach for anonymizing distributed data streams

Controlling Network Latency in Mixed Hadoop Clusters: Do We Need Active Queue Management?

IMFSSC: An In-Memory Distributed File System Framework for Super Computing

Towards cost-effective capacity provisioning for fault-tolerant green distributed data centers

Minimizing the cost of designing fault-tolerant CDN data centers

Týr: Blob Storage Meets Built-In Transactions

Multiple big data processing platforms

Coded distributed computing: Fundamental limits and practical challenges

Panda: Public auditing for shared data with efficient user revocation in the cloud

Data storage in big data context: A survey

Efficient Distributed Skyline over Imperfect Data Modeled by the Evidence Theory

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options