Raghunath Rajachandrasekar

chapter

High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA

Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Raghunath Rajachandrasekar, more

2015 IEEE International Parallel and Distributed Processing Symposium > 291 - 300

2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The viability and benefits of running MapReduce over modern High Performance Computing (HPC) clusters, with high performance interconnects and parallel file systems, have attracted much attention in recent times due to its uniqueness of solving data analytics problems with a combination of Big Data and HPC technologies. Most HPC clusters follow the traditional Beowulf architecture with a separate...

chapter

In-memory I/O and replication for HDFS with Memcached: Early experiences

Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Raghunath Rajachandrasekar, more

2014 IEEE International Conference on Big Data (Big Data) > 213 - 218

2014 IEEE International Conference on Big Data (Big Data)

Hadoop is the de-facto standard platform for large-scale data analytic applications. In spite of high availability and reliability guarantees, Hadoop Distributed File System (HDFS) suffers from huge I/O bottlenecks for storing the tri-replicated data blocks. The I/O overheads intrinsic to the HDFS architecture degrade the application performance. In this paper, we present a novel design (MEM-HDFS)...

chapter

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters

Akshay Venkatesh, Sreeram Potluri, Raghunath Rajachandrasekar, Miao Luo, more

2014 IEEE 28th International Parallel and Distributed Processing Symposium > 637 - 646

2014 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

Intel's Many-Integrated-Core (MIC) architecture aims to provide Teraflop throughput (through high degrees of parallelism) with a high FLOP/Watt ratio and x86 compatibility. However, this two-fold approach to solving power and programmability challenges for Exascale computing is constrained by certain architectural idiosyncrasies. MIC coprocessors have a memory constrained environment and its processors...

chapter

SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks

Xiangyong Ouyang, Nusrat S. Islam, Raghunath Rajachandrasekar, Jithin Jose, more

2012 41st International Conference on Parallel Processing > 470 - 479

2012 41st International Conference on Parallel Processing (ICPP)

Many applications cache huge amount of data in RAM to achieve high performance. A good example is Memcached, a distributed-memory object-caching software. Memcached performance directly depends on the aggregated memory pool size. Given the constraints of hardware cost, power/thermal concerns and floor plan limits, it is difficult to further scale the memory pool by packing more RAM into individual...

chapter

Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework

Raghunath Rajachandrasekar, Jai Jaswani, Hari Subramoni, Dhabaleswar K. Panda

2012 IEEE International Conference on Cluster Computing > 329 - 336

2012 IEEE International Conference on Cluster Computing (CLUSTER)

The rapid growth of supercomputing systems, both in scale and complexity, has been accompanied by degradation in system efficiencies. The sheer abundance of resources including millions of cores, vast amounts of physical memory and high-bandwidth networks are heavily under-utilized. This happens when the resources are time-shared amongst parallel applications that are scheduled to run on a subset...

chapter

Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI

Raghunath Rajachandrasekar, Xavier Besseron, Dhabaleswar K. Panda

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1136 - 1143

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Fault-detection and prediction in HPC clusters and Cloud-computing systems are increasingly challenging issues. Several system middleware such as job schedulers and MPI implementations provide support for both reactive and proactive mechanisms to tolerate faults. These techniques rely on external components such as system logs and infrastructure monitors to provide information about hardware/software...

chapter

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, more

2011 International Conference on Parallel Processing > 375 - 384

2011 International Conference on Parallel Processing (ICPP)

Checkpoint/Restart (C/R) mechanisms have been widely adopted by many MPI libraries [1 -- 3] to achieve fault-tolerance. However, a major limitation of such mechanisms is the intensive IO bottleneck caused by the need to dump the snapshots of all processes into persistent storage. Several studies have been conducted to minimize this overhead [4, 5], but most of these proposed optimizations are performed...

chapter

Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems?

Vilobh Meshram, Xavier Besseron, Xiangyong Ouyang, Raghunath Rajachandrasekar, more

2011 IEEE International Conference on Cluster Computing > 484 - 493

2011 IEEE International Conference on Cluster Computing (CLUSTER)

The demand for scalable I/O continues to grow rapidly as computer clusters keep growing. Much of the research in storage systems has been focused on improving the scale and performance of I/O throughput. Scalable file systems do a good job of scaling large file access bandwidth by striping or sharing I/O resources across many servers or disks. However, the same cannot be said about scaling file metadata...

chapter

High Performance Pipelined Process Migration with RDMA

Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Dhabaleswar K. Panda

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 314 - 323

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

-- Coordinated Checkpoint/Restart (C/R) is a widely deployed strategy to achieve fault-tolerance. However, C/R by itself is not capable enough to meet the demands of upcoming exascale systems, due to its heavy I/O overhead. Process migration has already been proposed in literature as a pro-active fault-tolerance mechanism to complement C/R. Several popular MPI implementations have provided support...

chapter

RDMA-Based Job Migration Framework for MPI over InfiniBand

Xiangyong Ouyang, Sonya Marcarelli, Raghunath Rajachandrasekar, Dhabaleswar K Panda

2010 IEEE International Conference on Cluster Computing > 116 - 125

2010 IEEE International Conference on Cluster Computing (CLUSTER 2010)

Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. However, this kind of approach is unable to provide the...

INFONA - science communication portal

Search results for: Raghunath Rajachandrasekar

High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA

In-memory I/O and replication for HDFS with Memcached: Early experiences

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters

SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks

Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework

Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems?

High Performance Pipelined Process Migration with RDMA

RDMA-Based Job Migration Framework for MPI over InfiniBand

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Raghunath Rajachandrasekar

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options