The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
On chip multiprocessors (CMPs) platforms, multiple co-scheduled applications can severely degrade performance and quality of service (QoS) when they contend for last-level cache (LLC) resources. Whether an application will impose destructive interference on co-scheduled applications is largely dependent on its own inherent cache access behavior characteristics. In this work, we first present case...
In cluster computers, RAM memory is spread among the motherboards hosting the running applications. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the system in this way is much cheaper than using a full-fledged shared memory implementation among motherboards. However, in this case, memory usage might widely differ among...
Transactional Memory (TM) is an emerging technology which simplifies the concurrency control in a parallel program. In this paper we propose Quick TM, a new hardware transactional memory (HTM) architecture. It incorporates three features to address known bottlenecks in the existing HTM architectures. First, we propose hardware-only dynamic detection of true-shared variables. Our result shows that...
Improving MPI foundational software to suit multicore systems is a key issue for developing effective parallel software on high performance communication domain. Towards this issue, in this paper, we propose a novel technique, called MPI Accelerator or MPIActor in short, which is a transparent middleware to enhance conventional MPI libraries. The main idea is to optimize MPI routines for multicore...
Algorithmic skeletons encapsulate typical parallel programming patterns such that they can be easily applied by users. Existing skeleton libraries usually work on distributed memory machines. We present an extension of our skeleton library Muesli which now allows to use the same application without modifications on a variety of parallel machines ranging from multi-processor distributed memory to many-core...
Virtual application appliances (VAA) (i.e., prebuilt virtual machines (VM) for specific scientific applications) are useful mechanisms to deal with the packaging of complex software systems and heterogeneous software environments (e.g., library version conflicts on different clusters and clouds). As an experience paper, we discuss some basic techniques for creating VAAs (e.g., virtual disk repositories...
This paper presents a fast and cycle-accurate memory subsystem modeling and evaluating framework for Chip Multiprocessors (CMPs), called TSIM (Tsinghua SIMulator), which gives a flexible and extensible approach to evaluating architecture designs, models or algorithms, including the network-on-chip interconnection, cache hardware prefetcher, memory system protocol, replacement policy, etc. TSIM is...
A simple, synthetic performance proxy for scientific applications is of great interest to the scientific computing community for the development of new products, procurements, and performance related questions in general. To develop such a performance proxy, we enhance the capability of the memory performance benchmark, Apex-MAP, by adding new concepts to capture the effects of computational details...
In this paper, we present performance analysis of two NASA applications using performance tools like Tuning and Analysis Utilities (TAU) and SGI MP Inside. MITgcmUV and OVERFLOW are two production-quality applications used extensively by scientists and engineers at NASA. MITgcmUV is a global ocean simulation model, developed by the Estimating the Circulation and Climate of the Ocean (ECCO) Consortium,...
Virtualization technology is currently widely used due to its benefits on high resource utilization, flexible manageability and powerful system security. However, its use for high performance computing (HPC) is still not popular due to the unclearness of the virtualization overheads. It's worthy to evaluate the virtualization cost and to find the performance bottleneck when running HPC applications...
Process placement is a technique widely used on parallel machines with heterogeneous interconnects to reduce the overall communication time. For instance, two processes which communicate frequently are mapped close to each other. Finding the optimal mapping between threads and cores in a shared-memory environment (for example, OpenMP and Pthreads) is an even more complex task due to implicit communication...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.