The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we propose a low-overhead optimizer for the ubiquitous sparse matrix-vector multiplication (SpMV) kernel on the Intel Xeon Phi manycore processor. The architectural differences of such processors compared to their multicore counterparts overly expose inherent structural weaknesses of different sparse matrices, intensifying performance issues beyond the traditionally reported memory bandwidth...
In this paper, we address cloud VoIP scheduling strategies to provide appropriate levels of quality of service to users, and cost to VoIP service providers. This bi-objective focus is reasonable and representative for real installations and applications. We conduct comprehensive simulation on real data of twenty three on-line non-clairvoyant scheduling strategies with fixed threshold of utilization...
Heterogeneous chip-multiprocessors with integrated CPU and GPU cores on the same die allow sharing of critical memory system resources among the applications executing on the twotypes of cores. In this paper, we explore memory system management driven by the quality of service (QoS) requirement of the GPU applications executing simultaneously with CPUapplications in such heterogeneous platforms. Our...
In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM’s POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100...
When a floating-point arithmetic is executed on a processor unit, round-off and truncation errors occur every calculation. These errors cause a precision issue in a large simulation which requires a great number of calculations. Therefore, we have developed the quadruple-precision basic linear algebra subprograms (QPBLAS) based on Bailey's double-double arithmetic. The multiplication operation of...
Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors — including NVIDIA, Intel, AMD and IBM — have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these...
Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS, ) to cope with the evolution and complexity of network technology. Although the network has received a lot of attention,...
Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional...
Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth...
Programming accelerators today usually requires managing separate virtual and physical memories, such as allocating space in and copying data between host and device memories. The OpenACC API provides data directives and clauses to control this behavior where it is required. This paper describes how the data model is supported in current OpenACC implementations, ranging from research compilers (OpenUH...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.