The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing...
OSEK/VDX, a standard of automobile OS, has been widely adopted by many manufacturers to design and develop a vehicle-mounted OS. With the increasing functionalities in vehicles, more and more complex applications are developed based on the OSEK/VDX OS. However, how to ensure the reliability of developed applications is becoming a challenge for developers. As to ensure the reliability of developed...
Neither static nor dynamic data race detection methods, by themselves, have proven to be sufficient for large HPC applications, as they often result in high runtime overheads and/or low race-checking accuracy. While combined static and dynamic approaches can fare better, creating such combinations, in practice, requires attention to many details. Specifically, existing state-of-the-art dynamic race...
Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, and allowing portability, and scalability. However, trace-based simulation approaches have encountered difficulty capturing and accurately replaying multi-threaded traces due to the inherent non-determinism in the execution of multi-threaded...
We investigate parallelizing flow- and context-sensitive static analysis for JavaScript. Previous attempts to parallelize such analyses for other languages typically start with the traditional framework of sequential dataflow analysis, and then propose methods to parallelize the existing sequential algorithms within this framework. However, we show that this approach is non-optimal and propose a new...
Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program execution on multicore processors, thereby incurring significant performance overhead. This paper presents a performance debugging framework, PerfPlay, to facilitate...
Data races make parallel programs hard to understand. Precise race detection that stops an execution on first occurrence of a race addresses this problem, but it comes with significant overhead. In this work, we exploit the insight that precisely detecting only write-after-write (WAW) and read-after-write (RAW) races suffices to provide cleaner semantics for racy programs. We demonstrate that stopping...
General-purpose computing on Graphics Processing Units (GPGPUs) became increasingly popular for a wide range of applications beyond traditional graphic rendering workloads. GPGPU exploits parallelism in applications via multithreading to hide memory latencies, and handles control complexity by barrier synchronizations. Warp scheduling algorithms have been optimized to increase memory latency hiding...
GPU programming models such as CUDA and OpenCL are starting to adopt a weaker data-race-free (DRF-0) memory model, which does not guarantee any semantics for programs with data-races. Before standardizing the memory model interface for GPUs, it is imperative that we understand the tradeoffs of different memory models for these devices. While there is a rich memory model literature for CPUs, studies...
Computation is increasingly moving to the data enter. Thus, the energy used by CPUs in the data centeris gaining importance. The centralization of computation in the data center has also led to much commonality between the applications running there. For example, there are many instances of similar or identical versions of the Apache web server running in a large data center. Many of these applications,...
The Missile Onboard Computer (OBC) is an embedded computer to perform control, guidance, target data estimation, mission sequencing and various other critical operations during flight. The real-time embedded software for the OBC must be designed to achieve deterministic response times and quite minimal jitters (in the order of 50–100µs) for the accurate control and guidance operations and precise...
Porting CUDA program to other heterogeneous and many-core platform especially native processor is very meaningful for extending the range of the CUDA application, taking advantage of many-core on target platform and supporting national industries. Traditional binary translation technique is not competent to this task. On the point of software reverse engineering, it is feasible to design a new migration...
Given the rapid change in processor architecture in the past years, there is a driving necessity to assess processor performance for a high performance computation workload. Assessing performance for a given workload is important to understand which architecture parameters the workload performance is sensitive to. A given workload can be categorized as memory bounded, compute bounded, or in between...
This research aims to develop a CUDA accelerated NEH algorithm for the permutative flowshop scheduling problem with makespan criterion. NEH has been shown in the literature as the best constructive heuristic for this particular problem. The CUDA based NEH aims to speed up the processing time by utilising the GPU cores for parallel evaluation. In order to show the versatility and scalability of the...
The Smith-Waterman algorithm is used in Bio-informatics to perform pairwise local alignment between a query sequence and a subject sequence. We present a GPU based parallel version of this algorithm that is able to perform pair-wise alignment faster than previous algorithms. In particular, it parallelizes each alignment, rather than relying on parallelism across multiple pair alignments, which many...
With the popularity of parallel computing, the serial program is unable to take advantage of the multi-core. In this paper, A MP3 audio parallel decoding algorithm based on libmad library for multicore platform is proposed to improve the decoding speed. In this method, to reach the parallel aim, more than one decoder is provided. Experimental results indicate that the algorithm improves the efficiency...
GPUs are widely used in high performance computing, due to their high computational power and high performance per Watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. The most common way to utilize a GPU cluster is a hybrid model, in which the GPU is used to accelerate...
We present an original GPU task generator that can be attached to the rasterization based rendering process, in order to provide a dynamic parallelism. Compared to other existing GPU task generators or schedulers our method creates new tasks using the GPU graphics pipeline task scheduler. These are generated in the geometry shader, by means of rasterizing new geometry that produces additional fragments...
Large graph processing is now a critical component of many data analytics. Graph processing is used from social networking web sites that provide context-aware services from user connectivity data to medical informatics that diagnose a disease from a given set of symptoms. Graph processing has several inherently parallel computation steps interspersed with synchronization needs. Graphics processing...
Thread-level speculation can speed up a single-thread application by splitting its execution into multiple tasks and speculatively executing those tasks in multiple threads. Efficient thread-level speculation requires hardware support for memory conflict detection, store buffering, and execution rollback, and in addition, previous research has also proposed advanced optimization facilities, such as...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.