The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present an improved three-step pipeline for the stereo matching problem and introduce multiple novelties at each stage. We propose a new highway network architecture for computing the matching cost at each possible disparity, based on multilevel weighted residual shortcuts, trained with a hybrid loss that supports multilevel comparison of image patches. A novel post-processing step is then introduced,...
The significance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many ×86 simulators. Comparing simulators with each other and validating their correctness has been...
High performance computing (HPC) applications are becoming more data-intensive and produce increasingly large I/O demands on storage systems. New storage devices such as SSD which has nearly no seek latency and high throughput have been widely used together with HDD to serve as a hybrid storage system. To solve the I/O bottleneck problem, existing hybrid storage solutions such as Burst Buffer have...
Advanced Computer Architecture is an upper-level required course offered by the Department of Computer Science and Engineering at the University of Alaska-Anchorage (UAA). Course content is structured to provide students with a qualitative and quantitative approach to computer architecture, which addresses both the hardware and software aspects of parallelism in modern computing systems. Historically,...
Trusted Platform Module (TPM) has gained its popularity in computing systems as a hardware security approach. TPM provides the boot time security by verifying the platform integrity including hardware and software. However, once the software is loaded, TPM can no longer protect the software execution. In this work, we propose a dynamic TPM design, which performs control flow checking to protect the...
With the drift from computation centric designs to communication centric designs in the Chip Multi Processor (CMP) era, the interconnect fabric is gaining more importance. An efficient NoC in terms of power, area and average flit latency has a huge impact on the overall performance of a CMP. In the current work, we propose MinBSD — a minimally buffered, single cycle, deflection router. It incorporates...
Automatic parallelization of sequential applications is the key for efficient use and optimization of current and future embedded multi-core systems. However, existing approaches often fail to achieve efficient balancing of tasks running on heterogeneous cores of an MPSoC. A reason for this is often insufficient knowledge of the underlying architecture's performance. In this paper, we present a novel...
Architectural advancement in hardware implementation of Java increases the performance. Java processors reduce the overhead of execution time and memory accesses of traditional implementation of JVM in embedded systems. To improve the performance of Java processors and decrease the execution time, we decided to customize a processor is called JOP. We design a Reconfigurable Functional Unit (RFU) which...
The massive parallelism provided by the modern graphics processing units (GPUs) makes them the attractive processors to accelerate the applications with high data-level parallelism. Therefore, the GPU architecture has recently gained a lot of attention in research community. However, the advance in the GPU architecture is impeded by the limited documents released from the major GPU vendors. Furthermore,...
Field-programmable gate arrays (FPGAs) are increasingly used to implement embedded digital systems, however, the hardware design necessary to do so is time-consuming and tedious. The amount of hardware design can be reduced by employing a microprocessor for less-critical computation in the system. Often this microprocessor is implemented using the FPGA reprogrammable fabric as a soft processor which...
In the computer hardware industry, there are currently two highly successful instruction set architectures (ISAs): the CISC x86 ISA which is an established standard architecture in the personal computer and server markets, and the RISC ARM ISA which has become the standard in the fast growing ultra-mobile computing devices market, such as smart-phones and tablets. Program binaries that run on one...
The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices...
Hardware customization is an effective approach for meeting application performance requirements while achieving high levels of energy efficiency. Application-specific processors achieve high performance at low energy by tailoring their designs towards a specific workload, i.e., an application or application domain of interest. A fundamental question that has remained unanswered so far though is to...
Modern Super scalar Processor squashes up all of wrong-path instructions when the branch prediction misses. In deeper pipelines, branch miss prediction penalty increases seriously owing to large number of squashed instructions. Exploiting control independence has been proposed for reducing this penalty. Control Independence method reuses control independent instructions (CI instructions) without squashing...
Nowadays, multimedia applications (MMAs) form an important workload for general purpose processors. Although the vector architecture is considered the most potential candidate for media processing, the traditional vector architecture has inefficiencies to execute MMAs. This paper proposes a media-oriented vector architecture, which improves the traditional one with a load-forwarding mechanism. The...
Single-instruction-multiple-data (SIMD) devices have been widely incorporated into baseline instruction level parallelism (ILP) processors to enable more efficient data level parallelism (DLP) support. This paper addresses the unsolved problem of the need to permute the SIMD elements packed in registers for maximum parallelism performance. An implicit data permutation (IDP) mechanism is proposed for...
In order to solve the challenges in processor design for the next generation wireless communication systems, this paper first proposes a system level design flow for communication domain specific processor, and then proposes a novel processor architecture for the next generation wireless communication named GAEA using this design flow. GAEA is a shared memory multi-core SoC based on Software Controlled...
This article introduces a fast algorithm for Connected Component Labeling of binary images called Light Speed Labeling. It is segment-based and a line-relative labeling that was especially thought for RISC computers. An extensive benchmark on both structured and unstructured images substanciates that the algorithm, the way it is designed, is faster and more runtime predictable than Wu's algorithm...
In current instruction set architecture (ISA) design, fixed length instructions are benefit for improving the efficiency of instruction dispatching. But in embeded computers where memory is limited, variable lengths instructions are much better in memory cost. In this VLIW (very long instruction word) architecture, a two-staged pipeline is used to expand and dispatch the variable lengths instructions...
Translation Lookaside Buffers (TLBs) are a staple in modern computer systems and have a significant impact on overall system performance. Numerous prior studies have addressed TLB designs to lower access times and miss rates; these, however, have been targeted towards uniprocessor architectures. As the computer industry embraces chip multiprocessor (CMP) architectures, it is important to study the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.