The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present a compilation-based technique for providing on-demand structural redundancy for massively parallel processor arrays. Thereby, application programmers gain the capability to trade throughput for reliability according to application requirements. To protect parallel loop computations against errors, we propose to apply the well-known fault tolerance schemes dual modular redundancy (DMR) and...
Space computing systems commonly use field-programmable gate arrays to provide fault tolerance by applying triple modular redundancy (TMR) to existing register-transfer-level (RTL) code. Although effective, this approach has a 3× area overhead that can be prohibitive for many designs that often allocate resources before considering effects of redundancy. Although a designer could modify existing RTL...
In this paper, we perform a systematic comparison to study the energy cost of varying data formats and data types w.r.t. arithmetic logic and data movement for accelerator-based heterogeneous systems in which both compute-intensive (FFT accelerator) and data-intensive accelerators (DLT accelerator) are added. We explore evaluation for a wide range of design processes (e.g. 32nm bulk-CMOS and projected...
Custom architectures are often adopted as more efficient alternatives to general purpose processors in terms of performance and power. However, the design of such architectures requires experts both in hardware and the application domain. In this paper we propose a method for speeding up the design space exploration. Our method, based on Pareto points, identifies sets of solutions in terms of scalar...
This paper discusses an optimized double-precision floating-point multiplier that can handle both denormalized and normalized IEEE 754 floating-point numbers. Discussions of the optimizations are given and compared versus similar implementations, however, the main objective is keeping compliant for denormalized IEEE 754 floating-point numbers while still maintaining high performance operations for...
The CHIME Pathfinder is a new interferometric radio telescope that uses a hybrid FPGA/GPU FX correlator. The GPU-based X-engine of this correlator processes over 819 Gb/s of 4+4-bit complex astronomical data from N=256 inputs across a 400MHz radio band. A software framework is presented to manage this real-time data flow, which allows each of 16 processing servers to handle 51.2 Gb/s of astronomical...
Data centers are a highly competitive environment that demands high performance and energy efficiency and, in many cases, low latency. Custom hardware can provide significant improvements over conventional microprocessors on those metrics. Microsoft has been investigating the use of reconfigurable logic, in the form of field programmable gate arrays, to accelerate its data centers. In this talk, I...
Power efficiency - performance relative to power - is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power...
The rapid development of mobile games highlights the power consumption problem in the mobile platform. Most of the power saving techniques use the prediction-based dynamic voltage frequency scaling (DVFS) scheme. However, the prediction could be inaccurate resulting from the frequent interactions of user when playing games. We have observed that frame rate is near-linear to CPU frequency, but there...
This paper presents the mixed-signal circuit implementation of reduced complexity algorithms for decoding low-density parity check (LDPC) codes. Based on modified differential decoding using binary message passing (MDD-BMP), binary addition using discrete-time digital circuits is replaced by continuous-time analog-current summation. Potential degradation due to the mismatch between current sources,...
Recently, the usage of GPU is not limited to the jobs associated with graphics and a wide variety of applications take advantage of the flexibility of GPUs to accelerate the computing performance. Among them, one of the most emerging applications is the fully homomorphic encryption (FHE) scheme, which enables arbitrary computations on encrypted data. Despite much research effort, it cannot be considered...
In many scientific computing applications, sparse Cholesky factorization is used to solve large sparse linear equations in distributed environment. GPU computing is a new way to solve the problem. However, sparse Cholesky factorization on GPU is hardly to achieve excellent performance due to the structure irregularity of matrix and the low GPU resource utilization. A hybrid CPU-GPU implementation...
Embedded systems require low overhead security approaches to ensure that they are protected from attacks. In this paper, we propose a hardware-based approach to secure the operation of an embedded processor instruction-by-instruction, where deviations from expected program behavior are detected within the execution of an instruction. These security-enabled embedded processors provide effective defenses...
Private Information Retrieval (PIR) protocols allow users to search for data items stored at an untrusted server, without disclosing to the server the search attributes. Several computational PIR protocols provide cryptographic-strength guarantees for the privacy of users, building upon well-known hard mathematical problems, such as factorisation of large integers. Unfortunately, the computational-intensive...
Vector quantization (VQ) is a general data compression technique that has a scalable implementation complexity and potentially a high compression ratio. In this paper, a novel implementation of VQ using stochastic circuits is proposed and its performance is evaluated. The stochastic and binary designs are compared for the same compression quality and the circuits are synthesized for an industrial...
Interferometric Synthetic Aperture Radar (InSAR) is a remote sensing technology used for estimating displacement of the earth's surface. Phase unwrapping is the most important step in InSAR processing and relies on successful selection of points that appear stable across a set of satellite images taken over time. This paper presents a new algorithm for selecting these points, a problem known as persistent...
We welcome you to the 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2015). This year's event takes place in Toronto, Canada on the campus of the University of Toronto. Prior to this year's visit to Toronto, the conference has been held in many places around the globe including Oxford (1986), San Diego (1988), Killarney (1989), Princeton (1990),...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.