The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Automation systems must primarily be deterministic and reliable, especially in safety-critical environments. With recent trends such as mass customization or Industry 4.0, there is an increasing need for automation systems to be dynamic. Changing parts of the software of today's automation systems, however, typically requires rebooting the controller, which makes software updates a complex and costly...
Power states in power-scalable systems are managed to maximize performance and reduce energy waste. Power-scalable processor capabilities (e.g., Intel Turbo Boost) embrace a "faster is better" approach to power management. While these technologies can vastly improve performance and energy efficiency, there is a growing body of evidence that "faster is not always better". For example,...
Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA,...
Within this paper an adaptive approach for parallel simulation of SystemC RTL models on future many-core architectures like the Single-chip Cloud Computer (SCC) from Intel is presented. It is based on a configurable parallel SystemC kernel that preserves the partial order defined by the SystemC delta cycles while avoiding global synchronization as far as possible. The underlying algorithm relies on...
The past few years have witnessed debate on how to improve link utilization of high-speed tiny-size buffer routers. Widely argued proposals for TCP traffic to realize acceptable link capacities mandate: (i) over-provisioned core link bandwidth; and (ii) non-bursty flows; and (iii) tens of thousands of asynchronous flows. However, in high speed access networks where flows are bursty, sparse and synchronous,...
In this paper we introduce exact and non-exact real-time waits in reactive Globally Asynchronous Locally Synchronous (GALS) programming languages and synchronous languages as their subset. The language constructs that allow use of real-time waits are illustrated on the SystemJ GALS language. They allow system designers to explicitly use, at the specification level, not only logical time but also the...
A number of NUMA-aware synchronization algorithms have been proposed lately to stress the scalability inefficiencies of existing locks. However their presupposed local lock granularity, a physical processor, is often not the optimum configuration for various workloads. This paper further explores the design space by taking into consideration the physical affinity between the cores within a single...
We present a kernel data interception technique that is undetectable by existing approaches to malware detection, and propose practical methods to detect it. The technique is based on breaking concurrency in a way that enables the attack code to take over the synchronization established by target kernel modules. That level of control allows the attack code to interpose between those modules, and thus...
Multi-core operating systems inherently face the problem of concurrent access to internal kernel state held in shared memory. Previous work on the Sloth real-time kernel proposed to offload the scheduling decisions to the interrupt hardware, thus removing the need for a software scheduler, no state has to be managed in software. While our existing design covers single-core platforms only, we now present...
Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance,...
Synchronization is an issue of significant importance in large-scale, distributed and high-speed systems. Traditional globally synchronous approach is no longer viable due to severe wire delay. Solutions such as “Globally Asynchronous, Locally Synchronous (GALS)” approaches suffer from metastability risk limiting their use in many-core SoC for critical applications, such as aerospace, military or...
As the system complexity increases, the simulation performance becomes one of the most important issues in virtual prototyping. Parallel simulation is a fascinating technique for high-speed simulation utilizing state of the art multi-core processors on a host workstation, but the efficiency of the parallel simulation is low because of the synchronization and communication overhead and unbalanced workloads...
Current technology trends for efficient use of infrastructures dictate that storage converges with computation by placing storage devices, such as NVM-based cards and drives, in the servers themselves. With converged storage the role of the interconnect among servers becomes more important for achieving high I/O throughput. Given that Ethernet is emerging as the dominant technology for datacenters,...
In this paper, we present CLUE, a system event analytics tool for black-box performance diagnosis in production Cloud Computing systems. CLUE provides an unified and extensible means of profiling service transactional behaviors, and builds structured data called event sketches. CLUE further offers a set of analytic tools for summarizing and analyzing event sketches by integrating data mining and statistical...
Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...
Manycore architecture promotes a massively parallel computing on the accelerators. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. The programming method on such accelerators includes development of a control program. The accelerator executes it to schedule the invocation timing of the accelerator's kernel program...
Nowadays, driven by the consumer demands, the multimedia market is booming and the video coding standards evolve rapidly. A dynamically coarse grain reconfigurable architecture REMUS-II (REconfigurable MUltimedia System 2) is developed as a multi-standards, high resolution, power efficient, and real-time multimedia decoding processor. The hierarchical pipeline is adopted in the REMUS-II for multimedia...
This work addresses the implications of using the Time-Division Unbalanced Carrier Sense Multiple Access (TDuCSMA) coordination function to support triple-play services. Firstly, the theoretical background of TDuCSMA is reported, presenting its advantages and discussing its full compliance with the IEEE 802.11 standard. Secondly, a prototype of TDuCSMA is discussed in details. Then, a set of experiments...
Data races in parallel programs are notoriously difficult to detect and resolve. Existing research has mostly focused on data race detection at the user level and significant progress has been made in this regard. It is difficult to apply detection methods designed for user-level applications to identify OS kernel level races. In this paper, we present a new detection tool that is able to effectively...
Virtualization is one of the main technologies currently used to deploy computing systems due to the high reliability and rapid crash recovery it offers in comparison to physical nodes. These features are mainly achieved by continuously producing snapshots of the status of running virtual machines. In earlier works, the snapshot of each individual VM is performed independently, ignoring the memory...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.