Advanced search

chapter

keybin: Key-Based Binning for Distributed Clustering

Xinyu Chen, Jeremy Benson, Trilce Estrada

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 572 - 581

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....

chapter

Improved Statistical Analysis Method Based on Big Data Technology

Hongsheng Xu, Ganglong Fan, Ke Li

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA) > 175 - 179

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA)

Big data technology refers to the rapid acquisition of valuable information from various types of large amounts of data. It can be divided into 8 technologies: data acquisition, data access, infrastructure, data processing, statistical analysis, data mining, model prediction and results presentation. The paper presents improved statistical analysis method based on big data technology. A statistical...

chapter

Distributed Particle-Based Rendering Framework for Large Data Visualization on HPC Environments

Jorji Nonaka, Naohisa Sakamoto, Takashi Shimizu, Masahiro Fujita, more

2017 International Conference on High Performance Computing & Simulation (HPCS) > 300 - 307

2017 International Conference on High Performance Computing & Simulation (HPCS)

In this paper, we present a distributed data visualization framework for HPC environments based on the PBVR (Particle Based Volume Rendering) method. The PBVR method is a kind of point-based rendering approach where the volumetric data to be visualized is represented as a set of small and opaque particles. This method has the object-space and image-space variants, defined by the place (object or image-...

chapter

An integrated disaster rapid cloud service platform using remote sensing data

Quan Zou, Guoqing Li, Wenyang Yu

2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 5221 - 5224

IGARSS 2017 - 2017 IEEE International Geoscience and Remote Sensing Symposium

Satellite can provide remote sensing data for disaster monitoring, and various sensors are generating huge volumes of remote sensing data for disaster management. It is urgent to store and process massive data acquired by satellite as fast as possible. A flexible and rapid service platform can realize integrated services from data acquisition, data production and product visualization. This article...

chapter

Big-Data in Climate Change Models — A Novel Approach with Hadoop MapReduce

Juan Manuel Carmona Loaiza, Graziano Giuliani, Giuseppe Fiameni

2017 International Conference on High Performance Computing & Simulation (HPCS) > 45 - 50

2017 International Conference on High Performance Computing & Simulation (HPCS)

The goal of this work is to present a software package which is able to process binary climate data through spawning Map-Reduce tasks while introducing minimum computational overhead and without modifying existing application code. The package is formed by the combination of two tools, Pipistrello, a Java utility that allows users to execute Map-Reduce tasks over any kind of binary file, Tina a lightweight...

chapter

A DDS-based distributed simulation for anti-air missile systems

Dohyung Kim, Hyun-Shik Oh, Seong Wook Hwang

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH) > 1 - 7

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH)

This paper introduces the development of a distributed air-defense engagement simulation model based on data distribution service (DDS). To design and develop effectively, system developers need a high-resolution engagement simulation including complex engineering-level models and operational scenario models. Increasing the resolution of the model results in the growing model's complexity which requires...

chapter

Computing in the Continuum: Combining Pervasive Devices and Services to Support Data-Driven Applications

Moustafa AbdelBaky, Mengsong Zou, Ali Reza Zamani, Eduard Renart, more

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) > 1815 - 1824

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

The exponential growth of digital data sources has the potential to transform all aspects of society and our lives. However, to achieve this impact, the data has to be processed promptly to extract insights that can drive decision making. Further, traditional approaches that rely on moving data to remote data centers for processing are no longer feasible. Instead, new approaches that effectively leverage...

chapter

Analysis and Evaluation of the GAS Model for Distributed Graph Computation

Jinyan Wang, Chengfei Zhang

2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW) > 283 - 285

2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW)

Compared with distributed graph computation, traditionally single node computation is unfitted in processing large scale graph data. The GAS (Gather, Apply and Scatter) Model is a universal vertex-cut graph computation programming model based on edge-centric programs to support graph algorithms, which process distributed graph computation after graph partition. In this paper, we introduce that three...

chapter

GPU in-Memory Processing Using Spark for Iterative Computation

Sumin Hong, Woohyuk Choi, Won-Ki Jeong

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 31 - 41

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Due to its simplicity and scalability, MapReduce has become a de facto standard computing model for big data processing. Since the original MapReduce model was only appropriate for embarrassingly parallel batch processing, many follow-up studies have focused on improving the efficiency and performance of the model. Spark follows one of these recent trends by providing in-memory processing capability...

chapter

Automated Dynamic Data Redistribution

Thomas Marrinan, Joseph A. Insley, Silvio Rizzi, Francois Tessier, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1208 - 1215

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

High-performance distributed memory applications often load or receive data in a format that differs from what the application uses. One such difference arises from how the application distributes data for parallel processing. Data must be redistributed from how it was laid out by the producer to how the application needs the data to be laid out amongst its processes. In this paper, we present a large-scale...

chapter

PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers

Abdulaziz Albatli, David McKee, Paul Townend, Lydia Lau, more

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService) > 233 - 242

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService)

Cloud Computing allows users to control substantial computing power for complex data processing, generating huge and complex data. However, the virtual resources requested by users are rarely utilized to their full capacities. To mitigate this, providers often perform over-commitment to maximize profit, which can result in node overloading and consequent task eviction. This paper presents a novel...

chapter

In-Memory Distributed Matrix Computation Processing and Optimization

Yongyang Yu, Mingjie Tang, Walid G. Aref, Qutaibah M. Malluhi, more

2017 IEEE 33rd International Conference on Data Engineering (ICDE) > 1047 - 1058

2017 IEEE 33rd International Conference on Data Engineering (ICDE)

The use of large-scale machine learning and data mining methods is becoming ubiquitous in many application domains ranging from business intelligence and bioinformatics to self-driving cars. These methods heavily rely on matrix computations, and it is hence critical to make these computations scalable and efficient. These matrix computations are often complex and involve multiple steps that need to...

chapter

MRSIM: Mitigating Reducer Skew In MapReduce

Lei Chen, Wei Lu, Xiaoping Che, Weiwei Xing, more

2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA) > 379 - 384

2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA)

MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data...

chapter

Privacy Preserving in distributed SVM data mining over horizontally partitioned data

Mohammed Z. Omer, Hui Gao, Nadir Mustafa

2016 13th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) > 189 - 194

2016 13th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)

Data Mining algorithms can tackle the data either centrally or distributed. Outsourcing data can solve the issues of processing, storing, and analyzing a massive data. A proportion of existing data in various places and to improve the classification results, we propose the following solution for data mining with preserving the privacy. However, a critical problem that precludes free sharing of information...

chapter

A new data-intensive task scheduling in optorsim, an open source grid simulator

Mahshid Helali Moghadam, Seyyed Morteza Babamir

2016 2nd International Conference on Open Source Software Computing (OSSCOM) > 1 - 6

2016 2nd International Conference on Open Source Software Computing (OSSCOM)

Scheduling is one of the most important issues in executing tasks in grid systems. A data grid mainly deals with sharing and managing large amounts of distributed data in executing data-intensive applications. It is primarily a solution to satisfy the requirements of data-intensive tasks processing. OptorSim is a useful open source simulation tool for data grids. In this paper a new two-step data-intensive...

chapter

Distributed Multi-class Rule Based Classification Using RIPPER

Aruna Govada, Varsha S. Thomas, Ipsita Samal, Sanjay K. Sahay

2016 IEEE International Conference on Computer and Information Technology (CIT) > 303 - 309

2016 IEEE International Conference on Computer and Information Technology (CIT)

Traditional data mining (DM) has certain challenges viz. Scalability, high dimensionality, distributed data and often it also requires huge amount of computational resources in terms of space and time to extract the hidden patterns in the data. In addition, the data has to be available at one location. But in today's era the data are often inherently distributed in several databases. Hence, due to...

chapter

A Multi-Objective Optimization Model for Data-Intensive Workflow Scheduling in Data Grids

Mahshid Helali Moghadam, Seyyed Morteza Babamir, Meghdad Mirabi

2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops) > 25 - 33

2016 IEEE 41st Conference on Local Computer Networks Workshops (LCN Workshops)

The concept of workflow is used for modeling many of the data-intensive scientific applications executed on data grids. A Workflow is a series of interdependent tasks during which data is processed by different tasks. Scheduling the workflows in the grids is the process of assigning tasks to appropriate resources with the aim of achieving goals such as reducing workflow completion time while considering...

chapter

Distributed RDFS Rules Reasoning for Large-Scaled RDF Graphs Using Spark

Ren Li, Qi Zhang, Huibin Wang, Guiping Wang

2016 9th International Conference on Service Science (ICSS) > 158 - 162

2016 9th International Conference on Service Science (ICSS)

Scalable processing on large-scaled RDF graphs becomes a critical issue with the explosion of semantic web technologies. Most of the existing distributed RDF querying and reasoning solutions are designed based on the MapReduce paradigm. However, MapReduce should be further optimized since several inherent limitations such as lack of efficient job scheduling and iterative computing mechanisms affect...

chapter

Privacy Protection of Data Attributes in Cloud Environment

Xiaolong Xu, Fuqiang Wan, Yanfei Sun

2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) > 18 - 26

2016 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

At present, there are a lot of mature encryption mechanisms and access control models for the protection of the data content in cloud environment. However, the research on the privacy protection of data attributes in cloud is still in the initial stage, which can be classified as two types: one is the privacy protection of data attributes during data transmission, including routing information, generation...

chapter

Time Optimization Modeling for Big Data Placement and Analysis for Geo-Distributed Data Centers

Awais Khan, Muhammad Attique, Tae-Sun Chung, Youngjae Kim

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 140 - 141

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Big data storage and sharing are becoming the major demand of the community. To overcome such issues, virtually unified data facilities are being presented with geodistributed data centers by providing the user with the single unified namespace. These unified data storage facilities lack efficient storage and analysis of data. To address these shortcomings in such unified data facilities, we designed...

INFONA - science communication portal

Advanced search

Advanced search in people

keybin: Key-Based Binning for Distributed Clustering

Improved Statistical Analysis Method Based on Big Data Technology

Distributed Particle-Based Rendering Framework for Large Data Visualization on HPC Environments

An integrated disaster rapid cloud service platform using remote sensing data

Big-Data in Climate Change Models — A Novel Approach with Hadoop MapReduce

A DDS-based distributed simulation for anti-air missile systems

Computing in the Continuum: Combining Pervasive Devices and Services to Support Data-Driven Applications

Analysis and Evaluation of the GAS Model for Distributed Graph Computation

GPU in-Memory Processing Using Spark for Iterative Computation

Automated Dynamic Data Redistribution

PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers

In-Memory Distributed Matrix Computation Processing and Optimization

MRSIM: Mitigating Reducer Skew In MapReduce

Privacy Preserving in distributed SVM data mining over horizontally partitioned data

A new data-intensive task scheduling in optorsim, an open source grid simulator

Distributed Multi-class Rule Based Classification Using RIPPER

A Multi-Objective Optimization Model for Data-Intensive Workflow Scheduling in Data Grids

Distributed RDFS Rules Reasoning for Large-Scaled RDF Graphs Using Spark

Privacy Protection of Data Attributes in Cloud Environment

Time Optimization Modeling for Big Data Placement and Analysis for Geo-Distributed Data Centers

Filter options

Publication date

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options