The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, there are many embedded systems with different architectures that have incorporated GPUs. However, it is difficult to develop CPU-GPU embedded systems using component-based development (CBD), since existing CBD approaches have no support for GPU development. In this context, when targeting a particular CPU-GPU platform, the component developer is forced to construct hardware-specific components,...
State machines are a common technique to describe state dependent systems such as communication protocols. Although such state machines typically incorporate events to switch between states, a description based on a pure event-based system is quite challenging. In this work, we describe the factors that complicate state machines on event basis and present solutions. These solutions are developed especially...
Humanoid robots are typical application of real-time systems and have required timing constraints, low-latency, and parallel/distributed processing to achieve fine-grained real-time execution. Therefore, we have developed Dependable Responsive Multithreaded Processor I (D-RMTP I), which has one Responsive Multithreaded Processing Unit with an 8-way prioritized Simultaneous Multithreading architecture...
The worst-case response time (WCRT) – the time span from release to completion of a real-time task – is a crucial property of real-time systems. However, WCRT analysis is complex in practice, as it depends not only on the realistic examination of worst-case execution times (WCET), but also on system-level overheads and blocking/preemption times. While the implicit path enumeration technique (IPET)...
Single-ISA heterogeneous multi-core processors have been demonstrated to improve the performance and efficiency of general-purpose workloads. However, these designs leave some performance on the table due to the common assumption that the cost of migrating a program from one core to another is high. This high cost is due to the reliance on the operating system for a migration via a context switch...
Cyber-Physical Systems (CPS) are tight integrations of computational and physical worlds for various kinds of applications. For example, a humanoid robot, which is a typical application of CPS, has required timing constraints, low-latency execution, and parallel processing to achieve fine-grained real-time execution. Therefore low-latency parallel real-time computing is an important factor for CPS...
Memory access tracing is aprogram analysis technique with many different applications, ranging from architectural simulation to (on-line) data placement optimization and security enforcement. In this article we propose a memory access tracing approach based on static x86 binary instrumentation. Unlike non-selective schemes, whichinstrument all the memory access instructions, our proposal selectively...
Recent high-level synthesis tools offer the capability to generate multi-threaded micro-architectures to hide memory access latencies. In many HLS flows, this is often achieved by just creating multiple processing element-instances (one for each thread). However, more advanced compilers can synthesize hardware in a spatial form of the barrel processor- or simultaneous multi-threading (SMT) approaches,...
Heterogeneous computing is a promising approach to tackle the thermal, power and energy constraints posed by modern desktop and embedded computing systems. However, by also allowing the migration of application threads to the most appropriate cores, significant performance gains and energy efficiency levels can also be attained. Nevertheless, the considerably large overheads usually imposed by software-based...
Emerging services applications operate on vast datasets that are kept in DRAM to minimize latency and to improve throughput. A considerable part of them have irregular memory references and then caused the serious locality issue. This paper presents a Software-based LIght weight Multithreading framework, SLIM, to conquer this problem for commodity hardware, which still keeps the simple style of multithreading...
We describe extending the hardware/software co-compiler Nymble to automatically generate multi-threaded (SIMT) hardware accelerators. In contrast to prior work that simply duplicated complete compute units for each thread, Nymble-MT reuses the actual computation elements, and adds just the required data storage and context switching logic. On the CHStone benchmark suite and a sample configuration...
Traditionally, operating system (OSes) suffers from a bifid priority space dictated by the co-existence of threads managed by kernel scheduler and asynchronous interrupt handlers scheduled by hardware. On real-time systems, where reliability and determinism plays a critical role, this approach presents a noteworthy lack, as any interrupt handler can interrupt an execution thread, regardless of its...
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements,...
On a resource-sharing platform, running software subcomponents in isolation is critical to protect user's privacy and data security. In client-server applications, thread isolation is required to prevent private data that only belongs to certain threads from being read or modified by other unauthorized threads running in the same address space. However, the current programming languages (C/C++) and...
We present a new system, KCoFI, that is the first we know of to provide complete Control-Flow Integrity protection for commodity operating systems without using heavyweight complete memory safety. Unlike previous systems, KCoFI protects commodity operating systems from classical control-flow hijack attacks, return-to-user attacks, and code segment modification attacks. We formally verify a subset...
As discovered in our previous benchmark works, a small number of workloads in PARSEC benchmark suite suffer from heavy performance loss in a virtual execution environment, of which the major loss exhibits fairly a strong connection with the thread synchronization operations. This paper examines one workload of this kind that makes heavy use of thread synchronization operations, and shows the performance...
The evolution of commodity hardware makes it a very attractive platform to develop high-performance networking applications that are affordable to deploy. All but the most trivial applications must copy packets into user-space for further analysis. Therefore, the allocation of memory for these copies becomes a performance-critical operation. In this work, we present a multi-layer slice memory allocator...
To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism...
Since the introduction of fully programmable vertex shader hardware, GPU computing has made tremendous advances. Exception support and speculative execution are the next steps to expand the scope and improve the usability of GPUs. However, traditional mechanisms to support exceptions and speculative execution are highly intrusive to GPU hardware design. This paper builds on two related insights to...
Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into units, referred to as warps, to amortize the cost of instruction fetch, decode and control logic over multiple execution units. As individual threads take divergent execution paths, their processing takes place sequentially, defeating part...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.