The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As more and more clusters with thousands of nodes are being deployed for high performance computing (HPC), fault tolerance in cluster environments has become a critical requirement. Checkpointing and rollback recovery is a common approach to achieve fault tolerance. Although widely adopted in practice, coordinated checkpointing has a known limitation on scalability. Severe contention for bandwidth...
It is critical to provide high performance for scientific applications running on chip multi-processors (CMP). A CMP architecture often comprises a shared 12 cache and lower-level storages. The shared 12 cache can reduce the number of cache misses if the data are accessed in common by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads...
Many of the load-balancing algorithms used in parallel systems do not have a concern about response times: tasks (or requests) are simply dispatched to a server, which provides no guarantees about their execution times. When there is a maximum acceptable response time (i.e. deadline) for tasks to be executed, the consequences caused by the adoption of traditional algorithms for load- balancing can...
This paper presents COBRA (continuous binary re-adaptation), a runtime binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based SMP and cc-NUMA systems. Using OpenMP NAS parallel benchmark, we show how COBRA can adoptively choose appropriate optimizations according to observed changing runtime program behavior. Coherent cache misses caused by true/false...
Performance and power are critical design constraints in today's high-end computing systems. Reducing power consumption without impacting system performance is a challenge for the HPC community. We present a runtime system (CPU MISER) and an integrated performance model for performance-directed, power-aware cluster computing. CPU MISER supports system-wide, application-independent, fine-grain, dynamic...
High performance clusters have been widely used to provide amazing computing capability for both commercial and scientific applications. However, huge power consumption has prevented the further application of large-scale clusters. Designing energy-efficient scheduling algorithms for parallel applications running on clusters, especially on the high performance heterogeneous clusters, is highly desirable...
Deep Packet Inspection (DPI) is a critical function in network security applications such as Firewalls and Intrusion Detection Systems (IDS). Signature based scanners used in DPI apply multi-pattern matching algorithms to check whether the packet payload or flow content contains a specified signature in a signature set. Existing multi-pattern matching algorithms sacrifice memory space to achieve better...
This paper focuses on a performance of network-on-a- chip (NoC) and I/O of ClearSpeed's CSX600 coprocessor with 96 multithread processing elements. Two versions of the Himeno benchmark were implemented on the CSX600 to evaluate its performance when it encounters frequent memory transfers between shared and local memories, or between local memories. In order to efficiently use the NoC bandwidth, the...
Breast cancer, with the exception of lung cancer, is the leading cause of cancer deaths in women. It is also one of the few cancers that can be controlled by using asymptomatic screening method, followed by effective treatments. One recent screening modality under development, microwave tomography, uses the apparent dielectric property contrasts between different breast tissues at microwave frequencies...
Program parallelization requires mapping computation and data to processing elements. Navigational programming (NavP), based on the principle of migrating computations, offers a different approach than the conventional solutions that use a SPMD model. This paper focuses on data distribution for NavP. We introduce the navigational trace graph (NTG), a mathematical structure that captures the alignment...
Consumer electronics applications are becoming increasingly complex because of increased functionality requirements, such as watching multiple compressed video streams on a single screen. We address this complexity by allowing a programmer to specify the application in terms of independent components. Components interact using streaming communication and by sending and receiving events. From this...
Most of existing search algorithms for unstructured peer-to-peer (P2P) systems share one common approach: the requesting node sends out a query and the query message is repeatedly routed and forwarded to other peers in the overlay network. Due to multiple hops involved in query forwarding, the search may result in a long delay before it is answered. Furthermore, some incapable nodes may be easily...
Several scientific applications such as 3D Jacobi iteration (Rivera and Tseng, 2000) and LQCD (Gupta,1996) demand high computing power, and run on parallel systems. Such applications mostly operate on high dimensional data, and partitioning them into smaller units would help reduce their execution time considerably. Many algorithms such as CBP (Beaumont, 2001), dissect (Nagamochi and Abe, 2003), and...
There has recently been increasing interests in using system virtualization to improve the dependability of HPC cluster systems. However, it is not cost-free and may come with some performance degradation, uncertain QoS and loss of functionalities. Meanwhile, many virtualization-enabled features such as online maintenance and fault tolerance do not require virtualization being always on. This paper...
Gnutella overlays have evolved to use a two-tier topology. However, we observed that the new topology had only achieved modest improvements in search success rates. Also, the new two-tier topology had not reduced the message routing overhead and bandwidth consumption. In this work, we used local information at each node to construct an overlay, Makalu, that improved search performance and reduced...
Ever-increasing memory footprint of applications and increasing mainstream popularity of shared memory parallel computing motivate us to explore memory compression potential in distributed shared memory (DSM) multiprocessors. This paper for the first time integrates on-the-fly cache block compression/decompression algorithms in the cache coherence protocols by leveraging the directory structure already...
Three-dimensional network-on-chip (3-D NoC) is an emerging research topic exploring the network architecture of 3-D ICs that stack several smaller wafers for reducing wire length and wire delay. Although the network topology of 3-D NoC has been explored for a couple of years, there is still only a narrow range of choices. In this paper, we propose a class of 3-D topologies called Xbar-connected network-on-tiers...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.