The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...
Accurate, real-time Automatic Speech Recognition (ASR) comes at a high energy cost, so accuracy has often to be sacrificed in order to fit the strict power constraints of mobile systems. However, accuracy is extremely important for the end-user, and today's systems are still unsatisfactory for many applications. The most critical component of an ASR system is the acoustic scoring, as it has a large...
Flow path plays an important role in hydrological analysis modeling, especially in the dynamic simulation of surface flow discharge. The existing flow-path network model (FPN) can extract the flow path from random flow source point to the basin outlet and simplify the three-dimensional terrain surface to the one-dimensional representation. However, with the increasing of the number of flow source...
Modern high performance computing and cloud computing infrastructures often leverage Graphic Processing Units (GPUs) to provide accelerated, massively parallel computational power. This performance gain, however, may also introduce higher energy consumption. The energy challenge has become more and more pronounced when the system scales. To address this challenge, we propose Archon, a framework for...
We describe our approach to extend the BEAGLE library for high-performance statistical phylogenetic inference (maximum likelihood estimation and Bayesian analysis) in order to support a wider range of modern accelerators and multicore CPUs, and present the corresponding performance results from these platforms. Our solution includes a shared code design providing a uniform interface for a variety...
Nowadays, applications must often handle a large amount of data and apply complex algorithms on it. It is a promising and popular way to apply the computation in parallel in order to meet the performance requirements. Since GPUs are designed to apply highly parallel computations efficiently, using CPU+GPU heterogeneous architecture have gained an increasing popularity in computation intensive applications...
Availability of affordable hardware that in effect enables desktop supercomputing has enabled more ambitious neural simulations driven by more complex software. However, this opportunity comes with costs, in terms of long learning curves to take advantage of the performance possibilities of idiosyncratic, architecturally heterogenous hardware and decreasing ability to be confident in the quality of...
Scientists who want to exploit the computing power of the latest parallel architectures are faced with a diverse set of architectures and a number of programming languages, models and approaches. Among several such programming techniques are directive-based programming models, OpenMP and OpenACC. This paper explores the similarities and the functionality gaps between both models and presents insights...
Presilicon simulation is one of the key toolsets for computer architects to evaluate and optimize their future designs. As Graphics Processing Units (GPUs) have become the platform of choice in many computing communities due to their impressive processing capabilities, computer architecture researchers need a simulation framework that allows them to quantitatively consider design tradeoffs. In this...
High performance computing platform is moving from homogeneous individual unites to heterogeneous systems. Where each unit is a combination of homogeneous cores and accelerator devices. Accelerator s uch as GPUs, FPGAs, DSPs, these devices usually designed for the specific and intensive type of computing tasks. The presence of these devices have created fresh and attractive development platforms for...
As throughput-oriented accelerators, GPUs provide tremendous processing power by running a massive number of threads in parallel. However, exploiting high degrees of thread-level parallelism (TLP) does not always translate to the peak performance that GPUs can offer, leaving the GPU's resources often under-utilized. Compared to compute resources, memory resources can tolerate considerably lower levels...
With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and...
The employment of five distinct benchmarks on the Distributed Environment for Academic Computing (DEAC) Cluster at Wake Forest University provides meaningful metrics of cluster processor and memory performance. Given the heterogeneous nature of the DEAC Cluster, the benchmarks taken consider the specific processor architectures comprising the cluster. The data obtained will be assessed via two modeling...
GPU has become an important component of the high performance computing system and its principal duty is parallel computing rather than graphical display. Determining the power and energy consumption is necessary to the scaling of GPU. This paper presents a statistic model to evaluate the power and energy consumption of AMD's integrated GPU (iGPU). By collecting the data of performance counters from...
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...
Over the last decade, CUDA and the underlying GPU hardware architecture have continuously gained popularity in various high-performance computing application domains such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent programming model for GPU clusters. We therefore introduce the dCUDA programming model, which implements device-side...
Computing platforms for high performance and parallel applications have changed rapidly during the past few years, from single to multiple cores, and from traditional Central Processing Units (CPUs) to hybrid systems which combine CPUs with accelerators such as Graphics Processing Units(GPUs), Intel Xeon Phi, etc. These developments bring more and more challenges to application developers, especially...
Accelerator-based platforms are heterogeneous in nature, yet most applications avoid heterogeneity, and focus on acceleration alone. Platform-level heterogeneity can bring significant performance improvement, as it essentially means using additional resources for the same computation. But is the performance gained using these additional resources worth the effort to program and deploy heterogeneous...
Object detection is a fundamental challenge facing intelligent applications. Image processing is a promising approach to this end, but its computational cost is often a significant problem. This paper presents schemes for accelerating the deformable part models (DPM) on graphics processing units (GPUs). DPM is a well-known algorithm for image-based object detection, and it achieves high detection...
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.