Search results

chapter

Core number optimization based scheduler to order/map hardware/software applications

Asma Rebaya, Imen Amari, Kaouther Gasmi, Salem Hasnaoui

2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM) > 1 - 6

2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)

Over these last years, the number of cores witnessed a spectacular increase in digital signal and general use processors. Concurrently, significant researches are done to get benefit from the high degree of parallelism. Indeed, these researches are focused to provide an efficient scheduling from hardware/software systems to multicores architecture. The scheduling process consists on statically choose...

chapter

Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning

Tong Shu, Chase Q. Wu

IEEE INFOCOM 2017 - IEEE Conference on Computer Communications > 1 - 9

IEEE INFOCOM 2017 - IEEE Conference on Computer Communications

Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models such as MapReduce are widely applied to meet stringent performance requirements. The granularity of task partitioning in each moldable job has a significant impact on workflow completion time and financial cost. We investigate the properties of moldable jobs and design a big-data...

chapter

From exaflop to exaflow

Tobias Becker, Pavel Burovskiy, Anna Maria Nestorov, Hristina Palikareva, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 404 - 409

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Exascale computing is facing a gap between the ever increasing demand for application performance and the underlying chip technology that does no longer deliver the expected exponential increases in CPU performance. The industry is now progressively moving towards dedicated accelerators to deliver high performance and better energy efficiency. However, the question of programmability still remains...

chapter

Optimistic loop optimization

Johannes Doerfert, Tobias Grosser, Sebastian Hack

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 292 - 304

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Compilers use static analyses to justify program optimizations. As every optimization must preserve the semantics of the original program, static analysis typically fall-back to conservative approximations. Consequently, the set of states for which the optimization is invalid is overapproximated and potential optimization opportunities are missed. Instead of justifying the optimization statically,...

chapter

Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD Simulations on the K Computer

Akie Mayumi, Yasuhiro Idomura, Takuya Ina, Susumu Yamada, more

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) > 17 - 24

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

The left-preconditioned communication avoiding conjugate gradient (LP-CA-CG) method is applied to the pressure Poisson equation in the multiphase CFD code JUPITER. The arithmetic intensity of the LP-CA-CG method is analyzed, and is dramatically improved by loop splitting for inner product operations and for three term recurrence operations. Two LPCA-CG solvers with block Jacobi preconditioning and...

chapter

Demo: SLP-aware word length optimization

Ali Hassan El Moussawi, Steven Derrien

2016 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 233 - 234

2016 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Many embedded processors do not support floating-point arithmetic. But they generally provide support for SIMD as a mean to improve performance for near-zero cost overhead. Achieving good performance when targeting such processors requires the use of fixed-point arithmetic and efficient SIMDization. To reduce applications time-to-market, automatic SIMDization and floating-point conversion methodologies...

chapter

Applying parameterized model checking to real-life cache coherence protocols

Vladimir Burenkov, Alexander Kamkin

2016 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 4

2016 IEEE East-West Design & Test Symposium (EWDTS)

This paper overviews a technique for verifying cache coherence protocols described in the Promela language. The approach is comprised of the following steps. First, a model written for a certain configuration of the memory system is generalized to the model being parameterized with the number of processors. Second, the parameterized model is abstracted from the exact number of processors. Finally,...

chapter

Polyhedral compilation for energy efficiency

Benoit Pradelle, Muthu Baskaran, Tom Henretty, Benoit Meister, more

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

In the last decade, the scope of software optimizations expanded to encompass energy consumption on top of the classical runtime minimization objective. In that context, several optimizations have been developed to improve the software energy efficiency. However, these optimizations commonly rely on long profiling steps and are often implemented as unstable runtime systems, which limits their applicability...

chapter

Opening polyhedral compiler's black box

Lenaic Bagneres, Oleksandr Zinenko, Stephane Huot, Cedric Bastoul

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 128 - 138

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

While compilers offer a fair trade-off between productivity and executable performance in single-threaded execution, their optimizations remain fragile when addressing compute-intensive code for parallel architectures with deep memory hierarchies. Moreover, these optimizations operate as black boxes, impenetrable for the user, leaving them with no alternative to time-consuming and error-prone manual...

chapter

Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Romp, Arvind K. Sujeeth, more

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 194 - 205

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

High performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware. However programming such architectures is extremely difficult due to the need to implement the application using multiple programming models and combine them together in ad-hoc ways. To optimize distributed applications both for modern hardware and for modern programmers...

chapter

Multi-objective scheduling for divisible load in heterogeneous distributed system

Hejun Xuan, Yuping Wang, Shanshan Hao, Xiaoli Wang

2016 IEEE Congress on Evolutionary Computation (CEC) > 3378 - 3384

2016 IEEE Congress on Evolutionary Computation (CEC)

The scheduling for divisible load in heterogeneous distributed system is a well known NP-hard problem. The problem is even more complex and challenging when its model has more than one objective, The difficulty is to satisfy multiple objectives that may be of conflicting nature. This paper investigates a multi-objective scheduling problem for divisible load in heterogeneous distributed systems. First,...

chapter

Performance Models for Split-Execution Computing Systems

Travis S. Humble, Alexander J. McCaskey, Jonathan Schrock, Hadayat Seddiqi, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 545 - 554

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track...

chapter

Late Parallelization and Feedback Approaches for Distributed Computation of Evolutionary Multiobjective Optimization Algorithms

O. Tolga Altinoz, Kalyanmoy Deb

2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI) > 40 - 44

2015 Second International Conference on Soft Computing and Machine Intelligence (ISCMI)

Distributing of the multiobjective optimization algorithm into various devices in a parallel fashion is a method for speeding up the computation time of the multiobjective evolutionary algorithms (MOEAs). When the processors are increased in number, the gain from parallelization decreases. Therefore, the aim of the parallelization method is not only to decrease the overall algorithm execution time,...

chapter

A new emigrant creation strategy for parallel Artificial Bee Colony algorithm

Dervis Karaboga, Selcuk Asian

2015 9th International Conference on Electrical and Electronics Engineering (ELECO) > 689 - 694

2015 9th International Conference on Electrical and Electronics Engineering (ELECO)

Artificial Bee Colony algorithm inspired by the foraging behaviour of real honey bees is one of the most popular swarm intelligence based optimization techniques. Like other population based evolutionary computation approaches, Artificial Bee Colony algorithm is intrinsically suitable for distributed architectures. However, determining which food source should be chosen to distribute between sub-colonies...

chapter

Power optimizations for transport triggered SIMD processors

Joonas Multanen, Timo Viitanen, Henry Linjamaki, Heikki Kultala, more

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) > 303 - 309

2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)

Power consumption in modern processor design is a key aspect. Optimizing the processor for power leads to direct savings in battery energy consumption in case of mobile devices. At the same time, many mobile applications demand high computational performance. In case of large scale computing, low power compute devices help in thermal design and in reducing the electricity bill. This paper presents...

chapter

PLC Introduction and Committees

Sunita Chandrasekaran

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 656 - 657

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

chapter

Model-Led Optimisation of a Geometric Multigrid Application

Richard Bunt, Simon Pennycook, Stephen Jarvis, Leigh Lapworth, more

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 742 - 753

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

This paper details the construction of an analytical performance model of HYDRA, a production nonlinear multigrid solver used by Rolls-Royce for computational fluid dynamics simulations. The model captures both the computational behaviour of HYDRA's key subroutines and the behaviour of its proprietary communication library, OPlus, with an absolute error consistently under 16% on up to 384 cores of...

chapter

Performance Study of SIMD Programming Models on Intel Multicore Processors

Peter Kristof, Hongtao Yu, Zhiyuan Li, Xinmin Tian

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2423 - 2432

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Modern multicore hardware employs a variety of parallel execution units, including multiple CPU cores for executing multiple threads simultaneously, vector units such as the Intel SIMD on the CPU cores, as well as GPU-like processing arrays. Availability of such unprecedented level of parallelism on main-stream computers offers an enormous potential to enable a new generation of computation-intensive...

chapter

Optimizing the Execution of Statistical Simulations for Human Evolution in Hyper-threaded Multicore Architectures

Raquel Dias, Cesar A.F. De Rose, Antonio Tadeu Azevedo Gomes, Nelson J.R. Fagundes

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 699 - 705

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Simulations of statistical models have been used to validate theories of past events in evolution of species. Studies concerning human evolution are important for understanding about our history and biodiversity. However, these approaches use complex statistical models, leading to high computational cost. The present paper proposes optimization techniques for Hyper-threaded multicore architectures...

chapter

Mesh Interface Resolution and Ghost Exchange in a Parallel Mesh Representation

Timothy J. Tautges, Jason A. Kraftcheck, Nathan Bertram, Vipin Sachdeva, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 1670 - 1679

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Algorithms are described for the resolution of shared vertices and higher-dimensional interfaces on domain-decomposed parallel mesh, and for ghost exchange between neighboring processors. Performance data is given for large (up to 64M tet and 32M hex element) meshes on up to 16k processors. Shared interface resolution for structured mesh is also described. Small modifications are required to enable...

INFONA - science communication portal

Search results

Core number optimization based scheduler to order/map hardware/software applications

Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning

From exaflop to exaflow

Optimistic loop optimization

Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD Simulations on the K Computer

Demo: SLP-aware word length optimization

Applying parameterized model checking to real-life cache coherence protocols

Polyhedral compilation for energy efficiency

Opening polyhedral compiler's black box

Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns

Multi-objective scheduling for divisible load in heterogeneous distributed system

Performance Models for Split-Execution Computing Systems

Late Parallelization and Feedback Approaches for Distributed Computation of Evolutionary Multiobjective Optimization Algorithms

A new emigrant creation strategy for parallel Artificial Bee Colony algorithm

Power optimizations for transport triggered SIMD processors

PLC Introduction and Committees

Model-Led Optimisation of a Geometric Multigrid Application

Performance Study of SIMD Programming Models on Intel Multicore Processors

Optimizing the Execution of Statistical Simulations for Human Evolution in Hyper-threaded Multicore Architectures

Mesh Interface Resolution and Ghost Exchange in a Parallel Mesh Representation

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options