The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
In this work we propose an accelerated stochastic learning system for very large-scale applications. Acceleration is achieved by mapping the training algorithm onto massively parallel processors: we demonstrate a parallel, asynchronous GPU implementation of the widely used stochastic coordinate descent/ascent algorithm that can provide up to 35× speed-up over a sequential CPU implementation. In order...
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...
RNNLM (Recurrent Neural Network Language Model) can save the historical information of the training dataset by the last hidden layer and can also as input for training. It has become an interesting topic in the field of Natural Language Processing research. However, the immense training time overhead is a big problem. The large output layer, hidden layer, last hidden layer and the connections among...
We consider the implementation of an in-situ machine learning system with the computing model promoted by Qarnot computing. Qarnot introduced an utility computing model in which servers are distributed in homes and offices where they serve as heaters. The Qarnot servers also embed several sensors for temperature, humidity, CO 2 etc. Qarnot offers an adequate platform to develop in-situ workflows for...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
We present the design and analysis of a parallel approximation algorithm for the problem of scheduling jobs on parallel identical machines to minimize makespan. The design of the parallel approximation algorithm is based on the best existing polynomial-time approximation scheme (PTAS) for the problem. To the best of our knowledge, this is the first practical parallel approximation algorithm for the...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
N-Body simulation simulates the evolution of a system that is composed of N particles, where each element receives a force that is due to the interaction with all the other elements within the system. Usually, the influence of external physical forces, such as gravity, is involved too. This methodology is widely used in different fields that range from astrophysics, where it is used to study the interaction...
Some of the newer processor architectures are no longer based on registers in order to increase their potential of instruction-level parallelism. Instead, they expose their data paths to the compiler so that the program is able to directly move data values between function units using suitable instructions. Some of these architectures require a synchronous transfer of data values while others use...
Mapping problems to Coarse Grained Reconfigurable Arrays (CGRA) has been researched for many years now. Yet, no feasible mapping algorithms are known that can be considered optimal or even near optimal. The main reason for this deficit is the complex nature of the mapping problem. It can be considered as a combined scheduling, binding and routing problem. It involves several constraints that need...
Reproducibility of the execution of scientific applications on parallel and distributed systems is a growing concern, underlying the trustworthiness of the experiments and the conclusions derived from experiments. Dynamic loop scheduling (DLS) techniques are an effective approach towards performance improvement of scientific applications via load balancing. These techniques address algorithmic and...
Maximizing parallelism level in applications can be achieved by minimizing overheads due to load imbalances and waiting time due to memory latencies. Compiler optimization is one of the most effective solutions to tackle this problem. The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all...
Hierarchical Multipole Methods (HMMs) are an important class of methods in scientific and engineering applications. They are challenging to parallelize for contemporary and emerging platforms using existing programming models. Asynchronous many-tasking (AMT) execution models provide abstractions suitable for HMMs and promise scalability in the context of future exascale systems. In our work we (1)...
The unprecedented amounts of data generated from large scientific simulations impose a grand challenge in data analytics, and I/O simply becomes a major performance bottleneck. To address this challenge, we present an application-aware I/O optimization technique in support of interactive large-scale scientific visualization. We partition a scientific data into blocks, and carefully place data blocks...
As an important numerical analysis method of rock mechanics, discontinuous deformation analysis (DDA) has been widely used in rock engineering. DDA has certain advantages such as the large time step and the large deformation, at the cost of relatively low computing efficiency. To address the efficiency bottleneck of DDA, this paper proposes a complete graphics processing unit (GPU)-based version....
The contribution of the present work relies on an innovative and judicious combination of several optimization techniques for achieving high performance when using automatic vectorization and hybrid MPI/OpenMP parallelism in a Particle-in-Cell (PIC) code. The domain of application is plasma physics: the code simulates 2d2v Vlasov-Poisson systems on Cartesian grids with periodic boundary conditions...
With the increase in the complexity and number of nodes in large-scale high performance computing (HPC) systems, the probability of applications experiencing failures has increased significantly. As the computational demands of applications that execute on HPC systems increase, projections indicate that applications executing on exascale-sized systems are likely to operate with a mean time between...
Provides an abstract of the keynote presentation and a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.
Providing new parallel programming models/abstractions as a set of library functions has the huge advantage that it allows for an relatively easy incremental porting path for legacy HPC applications, in contrast to the huge effort needed when novel concepts are only provided in new programming languages or language extensions. However, performance issues are to be expected with fine granular usage...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.