The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
State-of-the-art storage devices that have parallel capability have significantly reduced the performance gap between processor and storage I/O. However, the internal parallelism makes it difficult to measure utilization that can be used as a basis of load balancing, which is a critical feature of performance improvement of parallel systems. When utilization of storage reaches to one hundred percent,...
This article describes the methods of fuzzy operations implementation based on the model of 3D associative information storage and processing device. The offered methods differ by binary matrices comparison application basing on masked associative comparison with shift by rows.
High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a...
Hybrid storage is widely implemented as it satisfies the requirements of capacity and performance in an economically viable fashion. With the fast technical improvement, Hybrid storage systems consisting of several types of SSDs will be adopted gradually. Existing works mostly concentrate on thoroughly utilizing high-performance device but neglect the capability of low-performance device. This paper...
Network simulation is an important technique for designing interconnection networks and communication libraries. Also network simulations are useful for the analysis of internal communication behavior in parallel applications. This paper introduces a new interconnection network simulator NSIM-ACE. This simulator enables us to evaluate RDMA directly while existing simulators do not have such capability...
Low-power asymmetric multi-core processors (AMPs) are nowadays present in a wide variety of mobile and hand-held devices, and have attracted a lot of attention due to their appealing energy efficiency. However, these processors contain cores with different performance capabilities asking for solutions specifically tailored to exploit all their potential. In this paper, we provide two architecture-aware...
The goal of reaching exascale computing is made especially challenging by the highly heterogeneous nature of modern platforms and the energy they consume. As compute nodes typically utilize multiple multi-core CPU and are increasingly equipped with PCIe based accelerators, both are contributing to an ever more dynamic power consumption. In our study we evaluate our target application on a variety...
To enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a wide range of performance and energy efficiency options: from fast and power-hungry OoO processors to efficient, but slower in-order processors. The more...
Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...
In this paper, we propose a novel parallel Elliptic-Curve-Cryptography (ECC) point multiplication implementation over binary Galois Field, GF(2m) by exploiting the advantage of concurrent operation in a homogeneous multi-core processor to yield an improved performance. A modified Lopez-Dahab (LD) mix-coordinates point multiplication algorithm is developed that exploits concurrency and enables operation...
Solid state disks (SSDs) become more and more popular in personal devices and data centers. Flash chips can be packaged in Hard disk drive (HDD) form factors and provide the same interface as HDDs. This character makes SSDs easily replace HDDs in existing storage systems. PCIe-based SSD can provide a higher I/O performance, but it is still a little expensive. This paper studies the feasibility of...
Numerous TOP500 supercomputers are based on a torus interconnection network. The torus topology is effectively one of the most popular interconnection networks for massively parallel systems due to its interesting topological properties such as symmetry and simplicity. For instance, the world-famous supercomputers Fujitsu K, IBM Blue Gene/L, IBM Blue Gene/P and Cray XT3 are all torus-based. In this...
This paper proposes a detailed performance evaluation of an algorithm using spanning tree that automatically exploits the parallelism and determines an execution order of multiple kernel programs in distributed environment. In stream-based computing, efficient parallel execution requires careful scheduling of the invocation of the kernel programs. By mapping a kernel to a node and an I/O stream between...
WebCL is a browser version of the Khronos OpenCL standard. It allows a web browser to exploit GPU and CPU for parallel processing by embedding OpenCL kernel code into JavaScript code, which leads to significant speedups of compute-intensive applications such as physics and image processing. This paper presents a working prototype of WebCL-enabled browser that runs on Android-powered mobile devices,...
LocalMaxs extracts relevant multiword terms based on their cohesion but is computationally intensive, a critical issue for very large natural language corpora. The corpus properties concerning n-gram distribution determine the algorithm complexity and were empirically analyzed for corpora up to 982 million words. A parallel LocalMaxs implementation exhibits almost linear relative efficiency, speedup,...
In the latest years, we observed an exponential growth of the market of the mobile devices. In this scenario, it assumes a particular relevance the rate at which mobile devices are replaced. According to the International Telecommunicaton Union in fact, smart-phone owners replace their device every 20 months, on average. The side effect of this trend is to deal with the disposal of an increasing amount...
This paper analyzes the parallelization efficiency of Menge [1], an open source virtual crowd simulation system widely used for algorithm benchmarking, with focuses on three aspects: performance of the existing parallel processing scheme, bottleneck of parallel processing, and improvement opportunities for parallel efficiency of the system. First, we calculate the speedup ratio of each Menge module...
In this article is presented and assessed a massive parallel processing model for basic operations with k-mers from genomic sequences, based on defined functions in terms of N-dimensional spaces. The model is implemented using a set of OpenCL cores available at github.com/bioinfud/k-merscl and assessed using a heterogeneous platform CPU/GPU and a dataset based on randomly generated k-mers. The results...
Three-level page-mapping FTL scheme utilizes the characteristics of SSD hardware system, divides a plane into several parts called block-group. A block-group has a fixed number of physical blocks. In this scheme, a series of logical pages are stored in a block-group. Inside the block-group, the mapping relationship between logical page and physical page is fully associative. This scheme decreases...
Solid State Drives (SSDs) using flash memory storage technology present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism - parallel access to multiple internal flash memory chips - and a Flash Translation Layer...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.