The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Streaming applications, which are data-intensive, have been extensively run on High-Performance Computing (HPC) systems to seek the higher performance and scalability. These applications typically utilize broadcast operations to disseminate in real-time data from a single source to multiple workers, each being a multi-GPU based computing site. State-of-the-art broadcast operations take advantage of...
High-performance streaming applications are beginning to leverage the compute power offered by graphics processing units (GPUs) and high network throughput offered by high performance interconnects such as InfiniBand (IB) to boost their performance and scalability. These applications rely heavily on broadcast operations to move data, which is stored in the host memory, from a single source—typically...
The Message Passing Interface (MPI) standard specifies the use of (source, tag, communicator) tuple to identify whether an incoming message is what the receiver process is expecting. The cost associated with this process, commonly known as "tag matching", is tightly coupled with the communication pattern of the application and the load it generates at each individual process. Although researchers...
GPU accelerators are widely used in HPC clusters due to their massive parallelism and high throughput-per-watt. Data movement continues to be the major bottleneck on GPU clusters, more so when data is non-contiguous, which is common in scientific applications. CUDA-Aware MPI libraries optimize the non-contiguous data movement processing using latency oriented techniques such as using GPU kernels to...
GPGPUs are becoming ubiquitous entities in high performance computing systems owing to their large compute capacities at low power footprints. Together with high performance interconnects such as InfiniBand (IB), GPGPUs are paving the way for highly capable, energy-efficient distributed computing systems for scientific applications. GPGPUs are throughput devices that benefit immensely from latency...
Data intensive collective operations have a notable impact on the execution time and consequently the energy consumption of HPC applications owing to the amount of memory/processor/network resources involved in the data movement. However, mechanisms such as offload and one-sided transfers that are backed by RDMA-enabled interconnects like InfiniBand along with modern transport protocols like Dynamic...
Several streaming applications in the field of high performance computing are obtaining significant speedups in execution time by leveraging the raw compute power offered by modern GPGPUs. This raw compute power, coupled with the high network throughput offered by high performance interconnects such as InfiniBand (IB) are allowing streaming applications to scale to rapidly. A frequently used operation...
Non-blocking collectives have been recently standardized by the Message Passing Interface (MPI) Forum. However, intelligent designs offered by the MPIcommunication runtimes are likely to be the key factors that drive their adoption. While hardware based solutions for non-blocking collective operations have shown promise, they require specialized hardware support and currently have several performance...
The goal of any scheduler is to satisfy user's demands for computation and achieve a good performance in overall system utilization by efficiently assigning jobs to resources. However, the current state-of-the-art scheduling techniques do not intelligently balance node allocation based on the total bandwidth available between switches - that leads to over subscription. Additionally, poor placement...
Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...
Over the last decade, InfiniBand has become an increasingly popular interconnect for deploying modern supercomputing systems. However, there exists no detection service that can discover the underlying network topology in a scalable manner and expose this information to runtime libraries and users of the high performance computing systems in a convenient way. In this paper, we design a novel and scalable...
Hadoop Distributed File System (HDFS) acts as the primary storage of Hadoop and has been adopted by reputed organizations (Facebook, Yahoo! etc.) due to its portability and fault-tolerance. The existing implementation of HDFS uses Javasocket interface for communication which delivers suboptimal performance in terms of latency and throughput. For dataintensive applications, network performance becomes...
Graph-based computations are commonly used across various data intensive computing domains ranging from social networks to biological systems. On distributed memory systems, graph algorithms involve explicit communication between processes and often exhibit sparse, irregular behavior. Minimizing these communication overheads is critical to cater to the graph-theoretic analyses demands of emerging...
Scientists across a wide range of domains increasingly rely on computer simulation for their investigations. Such simulations often spend a majority of their run-times solving large systems of linear equations that require vast amounts of computational power and memory. It is hence critical to design solvers in a highly efficient and scalable manner. Hypre is a high performance, scalable software...
The emerging trends of designing commodity based supercomputing systems have a severe detrimental impact on the Mean-Time-Between-Failures (MTBF). The MTBF for typical HEC installations is currently estimated to be between eight hours and fifteen days. Failures in the interconnect fabric account for a fair share of the total failures occurring in such systems. This will continue to degrade as system...
It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid...
The upcoming MPI-3.0 standard is expected to include non-blocking collective operations. Non-blocking collectives offer a new MPI interface, using which an application can decouple the initiation and completion of collective operations. However, to be effective, the MPI library should provide a high performance and scalable implementation. One of the major challenges in designing an effective non-blocking...
Network congestion is an important factor affecting the performance of large scale jobs in supercomputing clusters, especially with the wide deployment of multi-core processors. The blocking nature of current day collectives makes such congestion a critical factor in their performance. On the other hand, modern interconnects like InfiniBand provide us with many novel features such as Virtual Lanes...
The rapid growth of InfiniBand, 10 Gigabit Ethernet/iWARP and IB WAN extensions is increasingly gaining momentum for designing high end computing clusters and data-centers. For typical applications such as data staging, content replication and remote site backup, FTP has been the most popular method to transfer data within and across these clusters. Although the existing sockets based FTP approaches...
Clusters based on commodity components continue to be very popular for high-performance computing (HPC). These clusters must be careful to balance both computational as well as I/O requirements of applications. This I/O requirement is generally fulfilled by a high-speed interconnect such as InfiniBand. The balance of computational and I/O performance is often changing, with the latest change being...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.