The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recent advances in genomic microarrays design provide the possibility to retrieve hundreds of thousands of significative genetic features from patients at affordable costs. Understanding if non-linear interactions (epistatic relationships) between these features determine or not the arising of complex common multifactorial genetic diseases is a critical task for human geneticists. The algorithms...
Processors and memory systems suffer from a growing performance gap between them. Each technology generation increases the on-chip performance capabilities however, memory bandwidth increases at a much slower pace. Therefore, overall performance improvements are constrained by the available memory bandwidth. In this paper, we address the memory bandwidth problem of vector processors by introducing...
Energy efficient sensor nodes are among the rapidly expanding applications for embedded systems technology. Typically, the processing resources in sensor nodes are based on programmable micro-controllers and digital signal processors, and the same processing architecture is used regardless of the actual task of the node. This regularly results in at least an order of magnitude over-provisioning of...
We present our parametric hardware architecture of the NIST approved Lucas probabilistic primality test. To our knowledge, our work is the first hardware architecture for the Lucas test. Our main contributions are a hardware architecture for calculating the Jacobi symbol based on the binary Jacobi algorithm, a pipelined modular add-shift module for calculating the Lucas sequences, methods for dependence...
The last decade a trend can be observed towards multi-processor Systems-on-Chip (MPSoC) platforms for satisfying the high computational requirements of modern multimedia applications. The research community has mainly focused on communication issues (e.g. bus vs. networks-on-chip). Real-time operating systems for MPSoCs however, have gotten very little attention. Existing techniques like rate-monotonic...
With the increasing proliferation of heterogeneous and reconfigurable computing, it has become essential to have efficient prediction models to drive early HW-SW partitioning and co-design. In this paper, we present a high level quantitative prediction modeling approach that accurately models the relation between hardware and software metrics, based on several statistical techniques. The proposed...
Virtual prototypes are widely employed in today's development of embedded hardware and software. To model and simulate the VPs, SystemC has been adopted as a standard language tool. With SystemC, hardware modules and software codes can be modeled as processes. To model concurrency, one process can be suspended and then the SystemC scheduler selects the next process to resume. This is also known as...
Reconfigurable processor architectures can dynamically switch their instruction set and instruction format at run time. They offer a new flexibility for adapting to changing applications' requirements in order to optimize performance and enable resource-awareness. While programmability is a key issue of such architectures, today's software toolchains are limited to static ISA architectures and must...
Functional simulators find widespread use as sub-systems within microarchitectural simulators. The speed of functional simulators is strongly influenced by the implementation style of the functional simulator, e.g. interpreted vs. binary-translated simulation. Speed is also strongly influenced by the level of detail of the interface the functional simulator presents to the rest of the timing simulator...
This paper introduces an Y-chart methodology for performance estimation based on high level models for both application and architecture. As embedded devices are more and more complex, the choice of the best suited architecture not only in terms of processing power but also in power consumption becomes a tedious task. In this context, estimation tools are key components in architecture choice methodology...
In recent years, road vehicles have seen a tremendous increase on driver assistance systems like lane departure warning, traffic sign recognition, or pedestrian detection. The development of efficient and cost-effective electronic control units that meet the necessary real-time performance for these systems is a complex challenge. Often, Electronic System-Level design tackles the challenge by simulation-based...
In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art...
Obtaining tight worst-case execution-time (WCET) estimations of real-time tasks is crucial since overly-pessimistic estimations are deemed impractical. One way of making WCET estimations tighter is to incorporate more program-flow information e.g., context-sensitive loop bounds, infeasible-path and same-path information, etc. In this paper we present and evaluate a completely automatic analysis that...
In this paper, a flexible HW architecture for video-based driver assistance applications is presented. It comprises a customizable and extensible processor template and several task-specific HW accelerators. The proposed heterogeneous architecture allows utilization of the programmable processor core for control and low data rate tasks. For the acceleration of computationally intensive tasks of the...
Future multi-core processors will necessitate exploitation of fine-grain, architecture-independent parallelism from applications to utilize many cores with relatively small local memories. We use c264, an end-to-end H.264 video encoder for the Cell processor based on ×264, to show that exploiting fine-grain parallelism remains challenging and requires significant advancement in runtime support. Our...
An innovative high throughput and scalable multi-transform architecture for H.264/AVC is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute the 4×4 forward/inverse integer DCT, as well as the 2-D 4×4 / 2×2 Hadamard transforms. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms...
High detection complexity is known to be one of the major challenges in MIMO communications based on spatial multiplexing. Tuple Search Detector (TSD) was recently introduced, significantly reducing detection complexity in comparison to conventional algorithms while achieving close to full max-log-APP BER performance. Besides high computational complexity, irregular control flow and sequential nature...
When transposing large matrices using SDRAM memories, typically a control overhead significantly reduces the data throughput. In this paper, a new address mapping scheme is introduced, taking advantage of multiple banks and burst capabilities of modern SDRAMs. Other address mapping strategies minimize the total number of SDRAM page-opens while traversing the two-dimensional index-space in row or column...
Designing interconnection networks for systems on-a-chip is getting more complex due to the increasing number and heterogeneity of elements they connect, the variety of technologies adopted to transmit and route information, the performance and cost requirements and constraints they have to satisfy. The complexity of such transmission fabrics gets then closer to that of telecommunication networks...
In throughput-aware CMPs like GPUs and DSPs, software-managed streaming memory systems are an effective way to tolerate high latencies. E.g., the Cell/B.E. incorporates local memories, and data transfers to/from those memories are overlapped with computation using DMAs. In such designs, the latency of the memory system has little impact on performance; instead, memory bandwidth becomes critical. With...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.