The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we propose a new degree of flexibility for soft processor design in which only the instructions relevant to the task at hand are implemented as a subset of the Instruction Set Architecture (ISA). These customized processors execute software kernels in the usual way, yet can be implemented with a fraction of the hardware resources used by other full- ISA soft processor cores. We present...
In order to reduce the date access conflicts and improve the memory access efficiency in radar signal processing, a linear varying step-size data management strategy combined with a hierarchical memory structure is proposed. By proposing the logical mapping strategy between the reconfigurable arrays and the multi-bank memory, the memory access performance of reconfigurable processor is improved and...
Automatic approximations of brain volumes are very useful in various researches and clinical practises. The conventional hand tracing is time consuming and the level of accuracy depends on the individual. The present work aims at the automatic estimation of brain volume and 3-D visualization using VTK in a pythonic environment after the edge enhancement and unsharp masking by quadratic filters for...
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive loop iterations. However, existing HLS techniques provide inadequate support for pipelining irregular loop nests that contain dynamic-bound inner loops, where unrolling is either very expensive or not even applicable. To overcome this major limitation,...
Two light beams that are seen as of having the same colour but that have different spectra are said to be metameric. The colour of a light beam is based on the reading of severel photodetectors with different spectral responses and metamerism results when a set of photodetectors is unable to resolve two spectra. The spectra are then said to be metameric. We are interested in exploring the concept...
Information retrieval is a technique used in search engines, advertisement placement and cognitive databases. With increasing amounts of data and stringent response time requirements, improving the underlying implementation of document retrieval becomes critical. To this end, we consider a Bloom filter, a simple randomized data structure that answers membership queries with no false negative and customizable...
Programming accelerators such as GPUs withlow-level APIs and languages such as OpenCL and CUDAis difficult, error-prone, and not performance-portable. Au-tomatic parallelization and domain specific languages (DSLs)have been proposed to hide complexity and regain performanceportability. We present P ENCIL, a rigorously-defined subset ofGNU C99 -- enriched with additional language constructs -- that...
GPUs can enable significant performance improvements for certain classes of data parallel applications and are widely used in recent computer systems. However, GPU execution currently requires explicit low-level operations such as 1) managing memory allocations and transfers between the host system and the GPU, 2) writing GPU kernels in a low-level programming model such as CUDA or OpenCL, and 3)...
The polyhedral model is a powerful algebraic framework that hasenabled significant advances to analysis and transformation ofsequential affine (sub)programs, relative to traditional AST-basedapproaches. However, given the rapid growth of parallel software, there is a need for increased attention to using polyhedral frameworksto optimize explicitly parallel programs. An interesting side effectof supporting...
The Laue diffraction microscopy experiment uses the polychromatic Laue micro-diffraction technique to examine the structure of materials with sub-micron spatial resolution in all three dimensions. During this experiment, local crystallographic orientations, orientation gradients and strains are measured as properties which will be recorded in HDF5 image format. The recorded images will be processed...
For applications that deal with large amounts of high dimensional multi-aspect data, it is natural to represent such data as tensors or multi-way arrays. Tensor computations, such as tensor decompositions, are increasingly being used to extract and explain properties of such data. An important class of tensors is the symmetric tensor, which shows up in real-world applications such as signal processing,...
Using multiple accelerators, such as GPUs or Xeon Phis, is attractive to improve the performance of large data parallel applications and to increase the size of their workloads. However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially non-uniform domain decomposition, inter-accelerator...
As the advantages of high performance and low power, the Loongson-1 processor has wide application prospects in industrial control, high-performance embedded, and other fields. Now the Loongson series platforms are mostly based on Linux operating system. However, VxWorks is a better choice for its high real-time performance and high reliability in the field of industrial control and high-performance...
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is widely available to achieve high performance in desktop, notebook, and even mobile computer systems. While multicore technology has become the norm of modern computers, programming such systems requires the understanding of underlying hardware architecture and hence posts a great challenge for...
In order to improve the real-time performance and reliability of the drive system for infrared image array, this paper designs an embedded drive system. With MPC8315 as the processing core, this system takes reflective memory network as the transmission unit. In order to verify and analyze the performance of the embedded drive system for the infrared image array, this paper sets up a test platform...
The square and rectangular shape of the pixels in the digital images for sensing and display purposes introduces several inaccuracies in the representation of digital images. The major disadvantage of square pixel shapes is the inability to accurately capture and display the details in the objects having variable orientations to edges, shapes and regions. This effect can be observed by the inaccurate...
In this paper we investigate static memory access predictability in GPGPU workloads, at the thread block granularity. We first show that a significant share of accessed memory addresses can be predicted using thread block identifiers. We build on this observation and introduce a hardware-software prefetching scheme to reduce average memory access time. Our proposed scheme issues the memory requests...
We present JolokiaC++ a compiler framework to ease coding of irregular data applications on GPUs. The effectiveness of the compiler and runtime systems of JolokiaC++ is tested using three kernels IRREG, MOLDYN and NBF, executed on NVIDIA GPUs. We developed extensions for the generic parallel constructs that allow portable and efficient programming of codes with irregular accesses on the GPU. We present...
This contribution presents a Direction of Arrival (DoA) estimation algorithm based on the complex Watson distribution to incorporate both phase and level differences of captured microphone array signals. The derived algorithm is reviewed in the context of the Generalized State Coherence Transform (GSCT) on the one hand and a kernel density estimation method on the other hand. A thorough simulative...
Process migration is one of the most important features in parallel and distributed computing. It enables dynamic load balance and makes better utilization of computing resource. Post-copy is a very efficient migration algorithm but it needs process to resume on destination node with incomplete address space which may significantly reduce its efficiency especially at the initial phase. To solve this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.