The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many software mechanisms for geophysics exploration in Oil & Gas industries are based on wave propagation simulation. To perform such simulations, state-of-art HPC architectures are employed, generating results faster and with more accuracy at each generation. The software must evolve to support the new features of each design to keep performance scaling. Furthermore, it is important to understand...
Multithread programming tools become popular for exploitation of high performance processing with the dissemination of multicore processors. In this context, it is also popular to exploit compiler optimization to improve the performance at execution time. In this work, we evaluate the performance achieved by the use of flags -O1, -O2, and -O3 of two C compilers (GCC and ICC) associated with five different...
The cost of maintaining an application code would significantly increase if the application code is branched into multiple versions, each of which is optimized for a different architecture. In this work, default and vector versions of a realworld application code are refactored to be a single version, and the differences between the versions are expressed as userdefined code transformations. As a...
Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large...
We describe an integration of the CHiLL compiler with OpenTuner to reduce the programmer burden in using autotuning. We use as a case study optimizing the smooth operator and its associated stencil computations in the context of Geometric Multigrid (GMG), a hierarchical linear solver that operates in multiple grid resolutions (levels). Smooth is the most performance-critical operation that runs multiple...
Considerable effort is currently being spent designing neuromorphic hardware for addressing challenging problems in a variety of pattern-matching applications. These neuromorphic systems offer low power architectures with intrinsically parallel and simple spiking neuron processing elements. Unfortunately, these new hardware architectures have been largely developed without a clear justification for...
Vectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multimedia and embedded applications on short SIMD architectures such as MMX, SSE and AltiVec. Most of the focus has been directed at innermost loops, effectively executing...
GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in...
This paper presents a distributed architecture for asynchronously implementing a class of nonlinear signal processing systems as web services, which in turn can be used to solve a broad class of optimization problems. As opposed to requiring specialized servers, the presented architecture requires only the use of commodity database backends as a central resource, as might typically be used to serve...
Cloud-based small cell networks (C-SCNs) have recently been proposed as new wireless cellular architecture. In cloud-based networks, optimization of radio resources at the base station (BS) is moved to a cloud data center for centralized optimization. In the center, multiple processors referred to as the cloud computational unit (CCU) are used for the optimization. As the cell size and networks become,...
In this paper, a novel ultra-fast multi-objective optimization algorithm for VLIW architecture design space exploration has been proposed. This method which is based on design space pruning, is applicable to any architecture objectives such as the number of issue widths, ALUs, the number of register file clusters and etc. Proposed method could be utilized for optimizing the configuration to meet various...
This paper describes a tool which enables source to source compilation. Implemented Polyhedral Source-to-Source Compiler (PSSC) is based on Polly compiler and LLVM infrastructure and it enables automatic recognition of parallel regions of C/C++ code and annotating them with OpenMP / OpenACC pragmas. The analysis of the input code is done by Polly compiler and then the results are mapped to original...
Modern high performance cluster systems for parallel processing are employing multi-core processors and high speed interconnection networks. Efficient mapping of the processes of a parallel application onto cores of such a cluster system, plays a vital role in improving the performance of that application. Parallel application can be modelled as a weighted graph showing the communication among the...
Dataflow computing is proved to be promising in high-performance computing. However, traditional dataflow architectures are general-purpose and not efficient enough when dealing with typical scientific applications due to low utilization of function units. In this paper, we propose an optimization of dataflow architectures for scientific applications. The optimization introduces a request for operands...
Multi-FPGA platforms are a popular choice today for complex system prototyping because they offer high execution speed, low cost, and real world testing experience. However, performance of multi-FPGA based systems is severely affected by widening logic to I/O gap in FPGAs. In order to address the performance issue, in this work, we propose an exploration and optimization flow for multi-FPGA based...
Mobile computing devices demand processors to offer a wide range of performance/power trade-offs so that they can provide much needed high performance or low power consumption depending on a given operating requirement. While dynamic voltage/frequency scaling (DVFS) has been the most powerful technique to provide such trade-offs, few processor vendors have the capability to provide a sufficient DVFS...
Existing high-level, source-to-source compilers can accept input programs in a high-level language (e.g., C) and perform complex automatic parallelization and other mappings using various optimizations. These optimizations often require trade-offs and can benefit from the user's involvement in the process. However, because of the inherent complexity, the barrier to entry for new users of these high-level...
In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping...
Power consumption in modern processor design is a key aspect. Optimizing the processor for power leads to direct savings in battery energy consumption in case of mobile devices. At the same time, many mobile applications demand high computational performance. In case of large scale computing, low power compute devices help in thermal design and in reducing the electricity bill. This paper presents...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.