Search results

chapter

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems

Matheus S. Serpa, Eduardo H.M. Cruz, Matthias Diener, Arthur M. Krause, more

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 49 - 54

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand...

chapter

Comparing Performance of C Compilers Optimizations on Different Multicore Architectures

Roger S. Machado, Ricardo B. Almeida, Andre D. Jardim, Ana M. Pernas, more

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 25 - 30

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Multithread programming tools become popular for exploitation of high performance processing with the dissemination of multicore processors. In this context, it is also popular to exploit compiler optimization to improve the performance at execution time. In this work, we evaluate the performance achieved by the use of flags -O1, -O2, and -O3 of two C compilers (GCC and ICC) associated with five different...

chapter

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 685 - 692

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The cost of maintaining an application code would significantly increase if the application code is branched into multiple versions, each of which is optimized for a different architecture. In this work, default and vector versions of a realworld application code are refactored to be a single version, and the differences between the versions are expressed as userdefined code transformations. As a...

chapter

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Kanishkan Vadivel, Mark Wijtvliet, Roel Jordans, Henk Corporaal

2017 Euromicro Conference on Digital System Design (DSD) > 14 - 21

2017 Euromicro Conference on Digital System Design (DSD)

Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large...

chapter

Automating Compiler-Directed Autotuning for Phased Performance Behavior

Tharindu Rusira, Mary Hall, Protonu Basu

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1362 - 1371

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We describe an integration of the CHiLL compiler with OpenTuner to reduce the programmer burden in using autotuning. We use as a case study optimizing the smooth operator and its associated stencil computations in the context of Geometric Multigrid (GMG), a hierarchical linear solver that operates in multiple grid resolutions (levels). Smooth is the most performance-critical operation that runs multiple...

chapter

Optimization-based computation with spiking neurons

Stephen J. Verzi, Craig M. Vineyard, Eric D. Vugrin, Meghan Galiardi, more

2017 International Joint Conference on Neural Networks (IJCNN) > 2015 - 2022

2017 International Joint Conference on Neural Networks (IJCNN)

Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for...

chapter

Outer-loop vectorization - revisited for short SIMD architectures

Dorit Nuzman, Ayal Zaks

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 2 - 11

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multimedia and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost loops, effectively executing...

chapter

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Xing Su, Xiangke Liao, Jingling Xue

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 122 - 133

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in...

chapter

A robust state-transfer architecture for distributed and asynchronous optimization

Tarek A. Lahlou, Thomas A. Baran

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 525 - 529

2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

This paper presents a distributed architecture for asynchronously implementing a class of nonlinear signal processing systems as web services, which in turn can be used to solve a broad class of optimization problems. As opposed to requiring specialized servers, the presented architecture requires only the use of commodity database backends as a central resource, as might typically be used to serve...

chapter

High level abstractions and automatic optimization techniques for the programming of irregular algorithms

David Padua

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3) > 1

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3)

article

Parallel Optimization Framework for Cloud-Based Small Cell Networks

Wibowo Hardjawana, Nur Ilyana Anwar Apandi, Branka Vucetic

IEEE Transactions on Wireless Communications > 2016 > 15 > 11 > 7286 - 7298

Cloud-based small cell networks (C-SCNs) have recently been proposed as new wireless cellular architecture. In cloud-based networks, optimization of radio resources at the base station (BS) is moved to a cloud data center for centralized optimization. In the center, multiple processors referred to as the cloud computational unit (CCU) are used for the optimization. As the cell size and networks become,...

chapter

An ultra-fast multi-objective optimization algorithm for VLIW architecture

Samira Nazari, Maryam Hassani, Ali Azarpeyvand

2016 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 7

2016 IEEE East-West Design & Test Symposium (EWDTS)

In this paper, a novel ultra-fast multi-objective optimization algorithm for VLIW architecture design space exploration has been proposed. This method which is based on design space pruning, is applicable to any architecture objectives such as the number of issue widths, ALUs, the number of register file clusters and etc. Proposed method could be utilized for optimizing the configuration to meet various...

chapter

Polyhedral Source-to-Source Compiler

Dominik Adamski, Grzegorz Jablonski, Piotr Perek, Andrzej Napieralski

2016 MIXDES - 23rd International Conference Mixed Design of Integrated Circuits and Systems > 458 - 463

2016 MIXDES - 23rd International Conference "Mixed Design of Integrated Circuits and Systems"

This paper describes a tool which enables source to source compilation. Implemented Polyhedral Source-to-Source Compiler (PSSC) is based on Polly compiler and LLVM infrastructure and it enables automatic recognition of parallel regions of C/C++ code and annotating them with OpenMP / OpenACC pragmas. The analysis of the input code is done by Polly compiler and then the results are mapped to original...

chapter

Process Assignment in Multi-core Clusters Using Job Assignment Algorithm

Chapram Sudhakar, Pankaj Adhikari, T. Ramesh

2016 Second International Conference on Computational Intelligence & Communication Technology (CICT) > 259 - 264

2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)

Modern high performance cluster systems for parallel processing are employing multi-core processors and high speed interconnection networks. Efficient mapping of the processes of a parallel application onto cores of such a cluster system, plays a vital role in improving the performance of that application. Parallel application can be modelled as a weighted graph showing the communication among the...

chapter

POSTER: An optimization of dataflow architectures for scientific applications

Xiaowei Shen, Xiaochun Ye, Xu Tan, Da Wang, more

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 441 - 442

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Dataflow computing is proved to be promising in high-performance computing. However, traditional dataflow architectures are general-purpose and not efficient enough when dealing with typical scientific applications due to low utilization of function units. In this paper, we propose an optimization of dataflow architectures for scientific applications. The optimization introduces a request for operands...

chapter

Inter-FPGA routing environment for performance exploration of multi-FPGA systems

Umer Farooq, Roselyne Chotin-Avot, Moazam Azeem, Maminionja Ravoson, more

2016 International Symposium on Rapid System Prototyping (RSP) > 1 - 7

2016 International Symposium on Rapid System Prototyping (RSP)

Multi-FPGA platforms are a popular choice today for complex system prototyping because they offer high execution speed, low cost, and real world testing experience. However, performance of multi-FPGA based systems is severely affected by widening logic to I/O gap in FPGAs. In order to address the performance issue, in this work, we propose an exploration and optimization flow for multi-FPGA based...

chapter

Comparison of single-ISA heterogeneous versus wide dynamic range processors for mobile applications

Hamid Reza Ghasemi, Ulya R. Karpuzcu, Nam Sung Kim

2015 33rd IEEE International Conference on Computer Design (ICCD) > 304 - 310

2015 33rd IEEE International Conference on Computer Design (ICCD)

Mobile computing devices demand processors to offer a wide range of performance/power trade-offs so that they can provide much needed high performance or low power consumption depending on a given operating requirement. While dynamic voltage/frequency scaling (DVFS) has been the most powerful technique to provide such trade-offs, few processor vendors have the capability to provide a sufficient DVFS...

chapter

Polyhedral user mapping and assistant visualizer tool for the r-stream auto-parallelizing compiler

Eric Papenhausen, Bing Wang, M. Harper Langston, Muthu Baskaran, more

2015 IEEE 3rd Working Conference on Software Visualization (VISSOFT) > 180 - 184

2015 IEEE 3rd Working Conference on Software Visualization (VISSOFT)

Existing high-level, source-to-source compilers can accept input programs in a high-level language (e.g., C) and perform complex automatic parallelization and other mappings using various optimizations. These optimizations often require trade-offs and can benefit from the user's involvement in the process. However, because of the inherent complexity, the barrier to entry for new users of these high-level...

chapter

MPC related computational capabilities of ARMv7A processors

Gianluca Frison, John Bagterp Jorgensen

2015 European Control Conference (ECC) > 3414 - 3421

2015 European Control Conference (ECC)

In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping...

chapter

Power optimizations for transport triggered SIMD processors

Joonas Multanen, Timo Viitanen, Henry Linjamaki, Heikki Kultala, more

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) > 303 - 309

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Power consumption in modern processor design is a key aspect. Optimizing the processor for power leads to direct savings in battery energy consumption in case of mobile devices. At the same time, many mobile applications demand high computational performance. In case of large scale computing, low power compute devices help in thermal design and in reducing the electricity bill. This paper presents...

INFONA - science communication portal

Search results

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems

Comparing Performance of C Compilers Optimizations on Different Multicore Architectures

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Automating Compiler-Directed Autotuning for Phased Performance Behavior

Optimization-based computation with spiking neurons

Outer-loop vectorization - revisited for short SIMD architectures

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

A robust state-transfer architecture for distributed and asynchronous optimization

High level abstractions and automatic optimization techniques for the programming of irregular algorithms

Parallel Optimization Framework for Cloud-Based Small Cell Networks

An ultra-fast multi-objective optimization algorithm for VLIW architecture

Polyhedral Source-to-Source Compiler

Process Assignment in Multi-core Clusters Using Job Assignment Algorithm

POSTER: An optimization of dataflow architectures for scientific applications

Inter-FPGA routing environment for performance exploration of multi-FPGA systems

Comparison of single-ISA heterogeneous versus wide dynamic range processors for mobile applications

Polyhedral user mapping and assistant visualizer tool for the r-stream auto-parallelizing compiler

MPC related computational capabilities of ARMv7A processors

Power optimizations for transport triggered SIMD processors

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options