Search results

chapter

ORCHESTRA: An asynchronous wait-free distributed GVT algorithm

Tommaso Tocci, Alessandro Pellegrini, Francesco Quaglia, Josep Casanovas-Garcia, more

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT) > 1 - 8

2017 IEEE/ACM 21st International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

Taking advantage of computing capabilities offered by modern parallel and distributed architectures is fundamental to run large-scale simulation models based on the Parallel Discrete Event Simulation (PDES) paradigm. By relying on this computing organization, it is possible to effectively overcome both the power and the memory wall, which are core limiting aspects to deliver high-performance simulations...

chapter

Energy Consumption Improvement of Shared-Cache Multicore Clusters Based on Explicit Simultaneous Multithreading

Matheus A. Souza, Tulio T. Cota, Matheus M. Queiroz, Henrique C. Freitas

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 1 - 6

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

The use of multicore clusters is one of the strategies used to achieve energy-efficient multicore architecture designs. Even though chips have multiple cores in these designs, cache constraints such as size, latency, concurrency, and scalability still apply. Multicore clusters must therefore implement alternative solutions to the shared cache access problem. Bigger or more frequently accessed caches...

chapter

Model-free optimal consensus control for multi-agent systems using kernel-based ADP method

Wei Wang, Xin Chen, Luefeng Chen, Min Wu

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 2471 - 2476

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

Adaptive dynamic programming (ADP) is a prevalent way to solve the coupled Hamilton-Jacobi-Bellman (HJB) equations of the optimal consensus control for multi-agent systems (MAS). Neural networks (NNs) are normally used to approximate the value functions in ADP. However, NNs with manually designed features may influence the approximation ability. In this study, kernel-based methods which do not need...

chapter

BlueCoDE: Bluetooth coordination in dense environment for better coexistence

Weiping Sun, Jonghoe Koo, Seongho Byeon, Woojin Park, more

2017 IEEE 25th International Conference on Network Protocols (ICNP) > 1 - 10

2017 IEEE 25th International Conference on Network Protocols (ICNP)

Dense Wi-Fi and Bluetooth (BT) environments become increasingly common so that the coexistence issue between Wi-Fi and BT is imperative to solve. In this paper, we propose BlueCoDE, a coordination scheme for multiple neighboring BT piconets, to make them collision-free and less harmful to Wi-Fi. BlueCoDE reuses BT's existing PHY and MAC design, thus making it practically feasible. We implement a prototype...

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

AUTOBAHN: Accelerating Concurrent, Durable File I/O via a Non-volatile Buffer

Hyeongwon Jang, Sang Youp Rhee, Jae Eun Kim, Sooyong Kang, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 228 - 232

2017 IEEE International Conference on Cluster Computing (CLUSTER)

As hardware vendors provision more cores and faster storage devices, attaining fast data durability for concurrent file writes is demanding to high-performance storage systems in cluster systems. We approach the challenge by proposing a system that uses a small amount of fast persistent memory for buffering concurrent file writes while preserving data durability. The main issue in designing a durable...

chapter

An enhanced end-to-end transparent clock mechanism for the kernel-based virtual machines

Yao Mingwu, Huang Zhenlin

2017 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS) > 1 - 5

2017 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control, and Communication (ISPCS)

Keeping a high-precision time base in cloud clusters is still a big challenge, even using the Precision Time Protocol version 2 (PTPv2) specified in IEEE 1588. One of the main factors on this issue is that too many uncertainties in the network path from the master clock to the slave one, which is likely residing on the Kernel-based Virtual Machine (KVM). The Transparent Clock (TC) of PTPv2 may be...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Tuan Ta, David Troendle, Xiaoqi Hu, Byunghyun Jang

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 132 - 139

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

The conventional OpenCL 1.x style CPU-GPU heterogeneous computing paradigm treats the CPU and GPU processors as loosely connected separate entities. At best each executes independent tasks, but, more commonly, the CPU idles while waiting for results from the GPU. No data-sharing and communications are allowed during kernel execution. This model limits the number of applications that can harness the...

chapter

Technical aspects in SAR image formation and interferometric processing of companion satellite SAR missions

Pau Prats-Iraola, Marc Rodriguez-Cassola, Paco Lopez-Dekker, Mariantonietta Zonno, more

2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 145 - 148

IGARSS 2017 - 2017 IEEE International Geoscience and Remote Sensing Symposium

The contribution focuses on the technical aspects related to the focusing and interferometric processing of bistatic data acquired by companion satellite (CS) SAR missions. In particular, the processing aspects related to the large along-track baseline configuration will be addressed, for the processing needs to properly consider a potential high squint angle. The technical challenges encompass synchronization,...

chapter

Publish-subscribe programming for a NoC-based multiprocessor system-on-chip

Jean Carlo Hamerski, Geancarlo Abich, Ricardo Reis, Luciano Ost, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Shared memory and message passing are traditional parallel programming models used on multiprocessor system-on-chip environments. Underlying models are traditionally meant for static scenarios where all communicating entities and their intercommunication patterns are known a priori by the software engineer. The systems design following such programming models became complex due to dynamic behavior...

chapter

Offloading Communication Control Logic in GPU Accelerated Applications

Elena Agostini, Davide Rossetti, Sreeram Potluri

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 248 - 257

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand...

chapter

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Jie Wang, Xinfeng Xie, Jason Cong

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 72 - 81

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Data movement is increasingly becoming the bottleneck of both performance and energy efficiency in modern computation. Until recently, it was the case that there is limited freedom for communication optimization on GPUs, as conventional GPUs only provide two types of methods for inter-thread communication: using shared memory or global memory. However, a new warp shuffle instruction has been introduced...

chapter

PTAT: An efficient and precise tool for collecting detailed TLB miss traces

Jiutian Zhang, Yuhang Liu, Xiaojing Zhu, Yuan Ruan, more

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 137 - 138

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

It is well known that the TLB performance impacts the memory system performance, which is critical for overall system performance. Similar to multi-level caches, multilevel TLBs have become an important leverage for boosting data access performance. Applications have increasingly large working sets. Servers targeting such applications have thus been built with ever larger main memory capacities, but...

chapter

Autonomous Decentralized Kernel Cache Architecture for Multi Ontology Based Information Extraction on Microsoft Windows

Hironao Takahashi, Khalid Mahmood, Uzair Lakhani

2017 IEEE 13th International Symposium on Autonomous Decentralized System (ISADS) > 15 - 22

2017 IEEE 13th International Symposium on Autonomous Decentralized System (ISADS)

Ontology Based Information Extraction (OBIE) is being adopted in various domains in order to improve the system's precision and recall. Though use of multiple ontologies in different semantic based Information Extraction systems helps to improve the system extraction accuracy but the performance of system degrades significantly. This paper proposes autonomous decentralized kernel cache architecture...

chapter

An FFT-based synchronization approach to recognize human behaviors using STN-LFP signal

Hosein M. Golshan, Adam O. Hebb, Sara J. Hanrahan, Joshua Nedrud, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 979 - 983

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Classification of human behavior is a key step to developing closed-loop Deep Brain Stimulation (DBS) systems, which may decrease the power consumption and side effects of the existing systems. Recent studies have shown that the Local Field Potential (LFP) signals from both Subthalamic Nuclei (STN) of the brain can be used to recognize human behavior. Since the DBS leads implanted in each STN can...

chapter

A command-level study of Linux kernel bugs

Yiliang Shi, Danny V. Murillo, Simeng Wang, Jinrui Cao, more

2017 International Conference on Computing, Networking and Communications (ICNC) > 798 - 802

2017 International Conference on Computing, Networking and Communications (ICNC)

As computer systems increase in size and complexity, bugs become ever subtler and more difficult to detect and diagnose. A bug could exist at different layers of computer systems (e.g., applications, shared libraries, file systems, device firmware), or could be caused by the incompatibility among layers. In many cases, bugs would require a very specific combination of events to be triggered and are...

chapter

Container-based virtualization for byte-addressable NVM data storage

Ellis R. Giles

2016 IEEE International Conference on Big Data (Big Data) > 2754 - 2763

2016 IEEE International Conference on Big Data (Big Data)

Container based virtualization is rapidly growing in popularity for cloud deployments and applications as a virtualization alternative due to the ease of deployment coupled with high-performance. Emerging byte-addressable, nonvolatile memories, commonly called Storage Class Memory or SCM, technologies are promising both byte-addressability and persistence near DRAM speeds operating on the main memory...

chapter

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

Paul R. Eller, William Gropp

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 204 - 215

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

The preconditioned conjugate gradient method (PCG) is a popular method for solving linear systems at scale. PCG requires frequent blocking allreduce collective operations that can limit performance at scale. We investigate PCG variations designed to reduce communication costs by decreasing the number of allreduces and by overlapping communication with computation using a non-blocking allreduce. These...

chapter

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda, Karen Tomko

2016 PGAS Applications Workshop (PAW) > 9 - 16

2016 PGAS Applications Workshop (PAW)

PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we present our experiences in designing Non-Blocking...

INFONA - science communication portal

Search results

ORCHESTRA: An asynchronous wait-free distributed GVT algorithm

Energy Consumption Improvement of Shared-Cache Multicore Clusters Based on Explicit Simultaneous Multithreading

Model-free optimal consensus control for multi-agent systems using kernel-based ADP method

BlueCoDE: Bluetooth coordination in dense environment for better coexistence

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

AUTOBAHN: Accelerating Concurrent, Durable File I/O via a Non-volatile Buffer

An enhanced end-to-end transparent clock mechanism for the kernel-based virtual machines

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Technical aspects in SAR image formation and interferometric processing of companion satellite SAR missions

Publish-subscribe programming for a NoC-based multiprocessor system-on-chip

Offloading Communication Control Logic in GPU Accelerated Applications

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

PTAT: An efficient and precise tool for collecting detailed TLB miss traces

Autonomous Decentralized Kernel Cache Architecture for Multi Ontology Based Information Extraction on Microsoft Windows

An FFT-based synchronization approach to recognize human behaviors using STN-LFP signal

A command-level study of Linux kernel bugs

Container-based virtualization for byte-addressable NVM data storage

Scalable Non-blocking Preconditioned Conjugate Gradient Methods

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options