Search results for: H. Subramoni

Items from 1 to 20 out of 23 results

chapter

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

C.-H. Chu, K. Hamidouche, H. Subramoni, A. Venkatesh, more

2016 First International Workshop on Communication Optimizations in HPC (COMHPC) > 29 - 38

2016 First International Workshop on Communication Optimizations in HPC (COMHPC)

Streaming applications, which are data-intensive, have been extensively run on High-Performance Computing (HPC) systems to seek the higher performance and scalability. These applications typically utilize broadcast operations to disseminate in real-time data from a single source to multiple workers, each being a multi-GPU based computing site. State-of-the-art broadcast operations take advantage of...

chapter

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

C.-H. Chu, K. Hamidouche, H. Subramoni, A. Venkatesh, more

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 59 - 66

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

High-performance streaming applications are beginning to leverage the compute power offered by graphics processing units (GPUs) and high network throughput offered by high performance interconnects such as InfiniBand (IB) to boost their performance and scalability. These applications rely heavily on broadcast operations to move data, which is stored in the host memory, from a single source—typically...

chapter

Adaptive and Dynamic Design for MPI Tag Matching

M. Bayatpour, H. Subramoni, S. Chakraborty, D. K. Panda

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 1 - 10

2016 IEEE International Conference on Cluster Computing (CLUSTER)

The Message Passing Interface (MPI) standard specifies the use of (source, tag, communicator) tuple to identify whether an incoming message is what the receiver process is expecting. The cost associated with this process, commonly known as "tag matching", is tightly coupled with the communication pattern of the application and the load it generates at each individual process. Although researchers...

chapter

Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems

C-H. Chu, K. Hamidouche, A. Venkatesh, D. S. Banerjee, more

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 983 - 992

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

GPU accelerators are widely used in HPC clusters due to their massive parallelism and high throughput-per-watt. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. CUDA-Aware MPI libraries optimize the non-contiguous data movement processing using latency oriented techniques such as using GPU kernels to...

chapter

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters

A. Venkatesh, K. Hamidouche, H. Subramoni, Dhabaleswar K. Panda

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 234 - 243

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

GPGPUs are becoming ubiquitous entities in high performance computing systems owing to their large compute capacities at low power footprints. Together with high performance interconnects such as InfiniBand (IB), GPGPUs are paving the way for highly capable, energy-efficient distributed computing systems for scientific applications. GPGPUs are throughput devices that benefit immensely from latency...

chapter

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms

H. Subramoni, A. Venkatesh, K. Hamidouche, K. Tomko, more

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects > 60 - 67

2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI)

Data intensive collective operations have a notable impact on the execution time and consequently the energy consumption of HPC applications owing to the amount of memory/processor/network resources involved in the data movement. However, mechanisms such as offload and one-sided transfers that are backed by RDMA-enabled interconnects like InfiniBand along with modern transport protocols like Dynamic...

chapter

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters

A. Venkatesh, H. Subramoni, K. Hamidouche, Dhabaleswar K. Panda

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Several streaming applications in the field of high performance computing are obtaining significant speedups in execution time by leveraging the raw compute power offered by modern GPGPUs. This raw compute power, coupled with the high network throughput offered by high performance interconnects such as InfiniBand (IB) are allowing streaming applications to scale to rapidly. A frequently used operation...

chapter

A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems

K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, more

2013 42nd International Conference on Parallel Processing > 611 - 620

2013 42nd International Conference on Parallel Processing (ICPP)

Non-blocking collectives have been recently standardized by the Message Passing Interface (MPI) Forum. However, intelligent designs offered by the MPIcommunication runtimes are likely to be the key factors that drive their adoption. While hardware based solutions for non-blocking collective operations have shown promise, they require specialized hardware support and currently have several performance...

chapter

Design of network topology aware scheduling services for large InfiniBand clusters

H. Subramoni, D. Bureddy, K. Kandalla, K. Schulz, more

2013 IEEE International Conference on Cluster Computing (CLUSTER) > 1 - 8

2013 IEEE International Conference on Cluster Computing (CLUSTER)

The goal of any scheduler is to satisfy user's demands for computation and achieve a good performance in overall system utilization by efficiently assigning jobs to resources. However, the current state-of-the-art scheduling techniques do not intelligently balance node allocation based on the total bandwidth available between switches - that leads to over subscription. Additionally, poor placement...

chapter

Extending OpenSHMEM for GPU Computing

S. Potluri, D. Bureddy, H. Wang, H. Subramoni, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 1001 - 1012

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...

chapter

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

H. Subramoni, S. Potluri, K. Kandalla, B. Barth, more

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Over the last decade, InfiniBand has become an increasingly popular interconnect for deploying modern supercomputing systems. However, there exists no detection service that can discover the underlying network topology in a scalable manner and expose this information to runtime libraries and users of the high performance computing systems in a convenient way. In this paper, we design a novel and scalable...

chapter

High performance RDMA-based design of HDFS over InfiniBand

N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, more

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Hadoop Distributed File System (HDFS) acts as the primary storage of Hadoop and has been adopted by reputed organizations (Facebook, Yahoo! etc.) due to its portability and fault-tolerance. The existing implementation of HDFS uses Javasocket interface for communication which delivers suboptimal performance in terms of latency and throughput. For dataintensive applications, network performance becomes...

chapter

Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?

K. Kandalla, A. Buluc, H. Subramoni, K. Tomko, more

2012 IEEE International Conference on Cluster Computing Workshops > 222 - 230

2012 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS)

Graph-based computations are commonly used across various data intensive computing domains ranging from social networks to biological systems. On distributed memory systems, graph algorithms involve explicit communication between processes and often exhibit sparse, irregular behavior. Minimizing these communication overheads is critical to cater to the graph-theoretic analyses demands of emerging...

chapter

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers

K. Kandalla, U. Yang, J. Keasler, T. Kolev, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 1156 - 1167

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Scientists across a wide range of domains increasingly rely on computer simulation for their investigations. Such simulations often spend a majority of their run-times solving large systems of linear equations that require vast amounts of computational power and memory. It is hence critical to design solvers in a highly efficient and scalable manner. Hypre is a high performance, scalable software...

chapter

Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters

S. Pai Raikar, H. Subramoni, K. Kandalla, J. Vienne, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1160 - 1167

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The emerging trends of designing commodity based supercomputing systems have a severe detrimental impact on the Mean-Time-Between-Failures (MTBF). The MTBF for typical HEC installations is currently estimated to be between eight hours and fifteen days. Failures in the interconnect fabric account for a fair share of the total failures occurring in such systems. This will continue to degrade as system...

chapter

Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

H. Subramoni, K. Kandalla, J. Vienne, S. Sur, more

2011 IEEE International Conference on Cluster Computing > 317 - 325

2011 IEEE International Conference on Cluster Computing (CLUSTER)

It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid...

chapter

Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL

K. Kandalla, H. Subramoni, J. Vienne, S. Pai Raikar, more

2011 IEEE 19th Annual Symposium on High Performance Interconnects > 27 - 34

2011 IEEE 19th Annual Symposium on High-Performance Interconnects (HOTI)

The upcoming MPI-3.0 standard is expected to include non-blocking collective operations. Non-blocking collectives offer a new MPI interface, using which an application can decouple the initiation and completion of collective operations. However, to be effective, the MPI library should provide a high performance and scalable implementation. One of the major challenges in designing an effective non-blocking...

chapter

Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters

H Subramoni, Ping Lai, S Sur, D K Panda

2010 39th International Conference on Parallel Processing > 462 - 471

39th International Conference on Parallel Processing (ICPP 2010)

Network congestion is an important factor affecting the performance of large scale jobs in supercomputing clusters, especially with the wide deployment of multi-core processors. The blocking nature of current day collectives makes such congestion a critical factor in their performance. On the other hand, modern interconnects like InfiniBand provide us with many novel features such as Virtual Lanes...

chapter

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Ping Lai, H. Subramoni, S. Narravula, A. Mamidala, more

2009 International Conference on Parallel Processing > 156 - 163

2009 International Conference on Parallel Processing (ICPP 2009)

The rapid growth of InfiniBand, 10 Gigabit Ethernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup, FTP has been the most popular method to transfer data within and across these clusters. Although the existing sockets based FTP approaches...

chapter

Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms

H. Subramoni, M. Koop, D.K. Panda

2009 17th IEEE Symposium on High Performance Interconnects > 112 - 120

2009 17th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2009)

Clusters based on commodity components continue to be very popular for high-performance computing (HPC). These clusters must be careful to balance both computational as well as I/O requirements of applications. This I/O requirement is generally fulfilled by a high-speed interconnect such as InfiniBand. The balance of computational and I/O performance is often changing, with the latest change being...

Publication date

Set your own date range

Keywords

INFINIBAND (10)
LIBRARIES (9)
ALGORITHM DESIGN AND ANALYSIS (6)
BENCHMARK TESTING (5)
MPI (5)
DELAY (4)
GRAPHICS PROCESSING UNITS (4)
PROTOCOLS (4)
SOCKETS (4)
BANDWIDTH (3)
MESSAGE PASSING (3)
MPI-3 NON-BLOCKING COLLECTIVES (3)
NETWORK TOPOLOGY (3)
PROGRAM PROCESSORS (3)
RECEIVERS (3)
SERVERS (3)
WORKSTATION CLUSTERS (3)
CLUSTERING ALGORITHMS (2)
COMPUTATIONAL MODELING (2)
COMPUTER ARCHITECTURE (2)
CUDA (2)
DATA MINING (2)
FABRICS (2)
GPU (2)
HARDWARE (2)
IP NETWORKS (2)
LOCAL AREA NETWORKS (2)
MEMORY MANAGEMENT (2)
MIDDLEWARE (2)
NOISE (2)
PEER-TO-PEER COMPUTING (2)
PERFORMANCE EVALUATION (2)
PROGRAMMING (2)
RANDOM ACCESS MEMORY (2)
RDMA (2)
ROUTING (2)
THROUGHPUT (2)
TOPOLOGY (2)
WIDE AREA NETWORKS (2)
1 GIGABIT ETHERNET NETWORKS (1)
2D BFS ALGORITHMS (1)
ADAPTIVE DESIGN (1)
ADVANCED DATA TRANSFER SERVICE (1)
ADVANCED MESSAGE QUEUING PROTOCOL (1)
ALLTOALL COLLECTIVE OPERATION (1)
AMQP (1)
APACHE QPID (1)
APPLICATION PROGRAM INTERFACES (1)
CACHE SHARING (1)
CACHE STORAGE (1)
CLOCKS (1)
CLUSTER RESOURCE MANAGEMENT (1)
CLUSTER TECHNOLOGY (1)
CLUSTER-OF-CLUSTERS (1)
CLUSTER-OF-CLUSTERS ARCHITECTURE (1)
COLLECTIVE OFFLOAD (1)
COLLECTIVE OFFLOAD AND INFINIBAND (1)
COLOR (1)
COMMUNICATION PROTOCOLS (1)
COMPUTATION/COMMUNICATION OVERLAP (1)
COMPUTATIONAL CYCLES (1)
COMPUTATIONAL FINANCE (1)
CONJUGATE GRADIENT SOLVERS (1)
CONTENT REPLICATION (1)
CONTEXT (1)
CORE-DIRECT (1)
CPMD APPLICATION (1)
CUDA-AWARE MPI (1)
DATA STAGING (1)
DATATYPE (1)
DEGRADATION (1)
DOUBLE DATA RATE (1)
ELECTRONICS PACKAGING (1)
ENERGY CONSUMPTION (1)
ENGINES (1)
ETHERNET (1)
FAILOVER (1)
FAULT TOLERANCE (1)
FAULT TOLERANT SYSTEMS (1)
FILE TRANSFER PROTOCOL (1)
FINANCIAL APPLICATIONS (1)
FINANCIAL DATA PROCESSING (1)
FTP (1)
FTP MECHANISM (1)
GPU-DIRECT RDMA (1)
HEAD OF LINE BLOCKING (1)
HIGH END COMPUTING CLUSTERS (1)
HIGH END DATA TRANSFER (1)
HIGH PERFORMANCE COMPUTING (1)
HIGH PERFORMANCE COMPUTING SYSTEMS (1)
HIGH PERFORMANCE LINPACK (1)
HIGH PERFORMANCE NETWORK CONNECTIVITY (1)
HIGH-PERFORMANCE COMPUTING (1)
HIGH-SPEED INTERCONNECT (1)
HOL BLOCKING (1)
HPC (1)
HPC MIDDLEWARE (1)
HPCC RANDOMLY ORDERED RING BANDWIDTH BENCHMARK (1)
I/O REQUIREMENT (1)
IB WAN (1)
more

INFONA - science communication portal

Search results for: H. Subramoni

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

Adaptive and Dynamic Design for MPI Tag Matching

Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters

A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems

Design of network topology aware scheduling services for large InfiniBand clusters

Extending OpenSHMEM for GPU Computing

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

High performance RDMA-based design of HDFS over InfiniBand

Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers

Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters

Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters

Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL

Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand

Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: H. Subramoni

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options