Search results

chapter

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, more

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 548 - 560

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...

chapter

EpCom: A parallel community detection approach for epidemic diffusion over social networks

Heng Zhang, Libo Zhang, Da Cheng, Yanjun Wu, more

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 1607 - 1614

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Detecting community structure in epidemics networks is crucial for the assessment of epidemic dynamics and effective control of disease spread by targeting at the individuals bridging communities. Common community detection models (e.g., cut-criteria and modularity-criteria based model) are efficient in optimal quality of network partitions. However, most of the approaches fail to consider the dynamic...

chapter

Heterogeneous acceleration for CNN training with many integrated core

Lei Shan, Canqun Yang, Weixia Xu, Minxuan Zhang

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) > 1 - 6

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)

Convolutional neural network (CNN) extracts features from big data by using the multilayer network structure. Due to the high effectiveness, CNN has achieved great successes in many fields such as computer vision and speech analysis. However, CNN training is quite challenging because computing the gradients through multiple layers is time consuming. In this paper, we propose to accelerate the computation...

chapter

Processing LSTM in memory using hybrid network expansion model

Yu Gong, Tingting Xu, Bo Liu, Wei Ge, more

2017 IEEE International Workshop on Signal Processing Systems (SiPS) > 1 - 6

2017 IEEE International Workshop on Signal Processing Systems (SiPS)

With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM)...

chapter

Low-power heterogeneous computing via adaptive execution of dataflow actors

Jani Boutellier, Shuvra S. Bhattacharyya

2017 IEEE International Workshop on Signal Processing Systems (SiPS) > 1 - 6

2017 IEEE International Workshop on Signal Processing Systems (SiPS)

Dataflow models of computation have been shown to provide an excellent basis for describing signal processing applications and mapping them to heterogeneous computing platforms that consist of multicore CPUs and graphics processing units (GPUs). Recently several efficient dataflow-based programming frameworks have been introduced for such needs. Most of contemporary signal processing applications...

chapter

Bioinspired cognitive model for autonomous behavior applied to obstacles avoidance in global navigation of a holonomic robot in a dynamic environment

Abelardo Gonzalez-Garcia, Francisco J. Ruiz-Sanchez

2017 14th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) > 1 - 6

2017 14th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)

In this paper, we introduce a cognitive model inspired in a rough description of the human cognition process to provide a more efficient parallel architecture for autonomous reactions in real systems. The model is a non-hierarchical structure composed of three main parallel blocks-mind, reason, action-in a nested double closed-loop configuration for control and supervision; the mind is a practical...

chapter

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

Zhongqi Cheng, Tim Schmidt, Guantao Liu, Rainer Doomer

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT) > 74 - 81

2017 IEEE International High Level Design Validation and Test Workshop (HLDVT)

The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...

chapter

3D GPU-based SPH simulation of water waves impacting on a floating object

A. R. Priyambada, D. Tarwidi

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC) > 60 - 65

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)

Smoothed particle hydrodynamics (SPH) is meshless-based numerical method to simulate free-surface flow problems. In this paper, water wave impact on a floating object is studied by implementing SPH method. An open-source DualSPHysics code which is developed based on SPH theory is used to simulate three-dimensional (3D) free-surface flow with floating object. Graphical processing units (GPUs) parallel...

chapter

On accelerating pair-HMM computations in programmable hardware

Subho S. Banerjee, Mohamed el-Hadedy, Ching Y. Tan, Zbigniew T. Kalbarczyk, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

This paper explores hardware acceleration to significantly improve the runtime of computing the forward algorithm on Pair-HMM models, a crucial step in analyzing mutations in sequenced genomes. We describe 1) the design and evaluation of a novel accelerator architecture that can efficiently process real sequence data without performing wasteful work; and 2) aggressive memoization techniques that can...

chapter

Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems

An Huynh, Kenjiro Taura

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 114 - 125

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Modern task parallel programming models provide sophisticated runtime task schedulers for handling the scheduling of logical tasks on a large and varying number of hardware parallel resources at runtime. The performance of these programming models increasingly rely on how fast their runtime schedulers do their job. The more delay a scheduler incurs in matching a ready task to a free processor core...

chapter

A comparison of three parallel processing methods for a resource allocation problem in the smart grid

B. Celik, S. Suryanarayanan, A. A. Maciejewski, H. J. Siegel, more

2017 North American Power Symposium (NAPS) > 1 - 6

2017 North American Power Symposium (NAPS)

Aggregators are market participants that bridge the gap between the bulk electricity market and the emerging active end-user (smart home) by efficiently scheduling or allocating resources to meet certain objectives in the electricity grid. The computational burden and processing time of such allocation problems increases with the number of resources. Using high performance computing and parallel processing...

chapter

Cache Automaton: Repurposing Caches for Automata Processing

Arun Subramaniyan, Jingcheng Wang, Ezhil R. M. Balasubramanian, David Blaauw, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 373

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Finite State Automata (FSA) are powerful computational models for extracting patterns from large streams (TBs/PBs) of unstructured data such as system logs, social media posts, emails, and news articles. FSA are also widely used in network security [6], bioinformatics [4] to enable efficient pattern matching. Compute-centric architectures like CPUs and GPG-PUs perform poorly on automata processing...

chapter

Computational acceleration of image inpainting Alternating-Direction Implicit (ADI) method using GPU CUDA

Mutaqin Akbar, Pranowo, Suyoto Magister

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC) > 185 - 189

2017 International Conference on Control, Electronics, Renewable Energy and Communications (ICCREC)

This paper presents a computational acceleration of image inpainting using parallel processing based on Graphics Processing Unit (GPU) Compute Unified Device Architecture (CUDA). We use parabolic partial differential equation (PDE) called heat equation as the model equation. The heat equation is discretized numerically using Finite Difference method. Semi-algebraic equation that formed then solved...

chapter

Associative methods of fuzzy operations implementation

M.M. Zernov, V.V. Mladov

2017 Second Russia and Pacific Conference on Computer Technology and Applications (RPC) > 199 - 204

2017 Second Russian-Pacific Conference on Computer Technology and Applications (RPC)

This article describes the methods of fuzzy operations implementation based on the model of 3D associative information storage and processing device. The offered methods differ by binary matrices comparison application basing on masked associative comparison with shift by rows.

chapter

Cloudifier virtual apps: Virtual desktop predictive analytics apps environment based on GPU computing framework

Andrei Ionut Damian, Alexandru Purdila, Nicolae Tapus

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) > 133 - 138

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

The need for systems capable of conducting inferential analysis and predictive analytics is ubiquitous in a global information society. With the recent advances in the areas of predictive machine learning models and massive parallel computing a new set of resources is now potentially available for the computer science community in order to research and develop new truly intelligent and innovative...

chapter

Performance evaluation of MATLAB/Simulink models for fitting embedded multicore systems

Kaouther Gasmi, Imen Amari, Asma Rebeya, Salem Hasnaoui

2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM) > 1 - 6

2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)

The embedded software systems are first designed and validated by high level models such as MATLAB/Simulink functional models. However, implementing a Simulink functional model on multicore architecture is not trivial. Designers might need first to select an adequate multicore architecture that provides a higher performance for a given Simulink model. Hence, it is important to have a set of performance...

chapter

Inferring Genome-Wide Gene Regulatory Networks with GPU or CPU Parallel Algorithm

Ming Zheng, Mugui Zhuo, Shugong Zhang, Guixia Liu

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA) > 54 - 58

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA)

Expression of gene block, with the GPU parallel thread structure characteristic calculation, according to the structural characteristics of GPU thread design of double parallel mode, and the use of texture cache memory to achieve high efficiency; on the basis of CPU two level cache capacity of basic blocks further subdivided into sub blocks to improve the cache hit rate, the technology to reduce the...

chapter

High-Performance General-Purpose Arithmetic Operations Using the Massive Parallel DNA-Based Computation

Mercedeh Sanjabi, Ali Jahanian, Maryam Tahmasebi

2017 Euromicro Conference on Digital System Design (DSD) > 543 - 546

2017 Euromicro Conference on Digital System Design (DSD)

In this paper, we presented a graph-based computing model to perform the basic arithmetic operations using the DNA computing model with maximum parallelization capabilities. In other words, we proposed a mathematical transformation model to map the basic arithmetic operations to the Hamilton Path Problem (HPP) which can be solved with DNA computers easily and efficiently. Our analyses and simulations...

chapter

MERCATOR: A GPGPU Framework for Irregular Streaming Applications

Stephen V. Cole, Jeremy Buhler

2017 International Conference on High Performance Computing & Simulation (HPCS) > 727 - 736

2017 International Conference on High Performance Computing & Simulation (HPCS)

GPUs have a natural affinity for streaming applications exhibiting consistent, predictable dataflow. However, many high-impact irregular streaming applications, including sequence pattern matching, decision-tree and decision-cascade evaluation, and large-scale graph processing, exhibit unpredictable dataflow due to data-dependent filtering or expansion of the data stream. Existing GPU frameworks do...

chapter

Energy Efficiency Optimization of Task-Parallel Codes on Asymmetric Architectures

Luis Costero, Francisco D. Igual, Katzalin Olcoz, Francisco Tirado

2017 International Conference on High Performance Computing & Simulation (HPCS) > 402 - 409

2017 International Conference on High Performance Computing & Simulation (HPCS)

We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution...

INFONA - science communication portal

Search results

Scalpel: Customizing DNN pruning to the underlying hardware parallelism

EpCom: A parallel community detection approach for epidemic diffusion over social networks

Heterogeneous acceleration for CNN training with many integrated core

Processing LSTM in memory using hybrid network expansion model

Low-power heterogeneous computing via adaptive execution of dataflow actors

Bioinspired cognitive model for autonomous behavior applied to obstacles avoidance in global navigation of a holonomic robot in a dynamic environment

Thread- and data-level parallel simulation in SystemC, a Bitcoin miner case study

3D GPU-based SPH simulation of water waves impacting on a floating object

On accelerating pair-HMM computations in programmable hardware

Delay Spotter: A Tool for Spotting Scheduler-Caused Delays in Task Parallel Runtime Systems

A comparison of three parallel processing methods for a resource allocation problem in the smart grid

Cache Automaton: Repurposing Caches for Automata Processing

Computational acceleration of image inpainting Alternating-Direction Implicit (ADI) method using GPU CUDA

Associative methods of fuzzy operations implementation

Cloudifier virtual apps: Virtual desktop predictive analytics apps environment based on GPU computing framework

Performance evaluation of MATLAB/Simulink models for fitting embedded multicore systems

Inferring Genome-Wide Gene Regulatory Networks with GPU or CPU Parallel Algorithm

High-Performance General-Purpose Arithmetic Operations Using the Massive Parallel DNA-Based Computation

MERCATOR: A GPGPU Framework for Irregular Streaming Applications

Energy Efficiency Optimization of Task-Parallel Codes on Asymmetric Architectures

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options