Search results

chapter

Neural network for saturation prediction of solid state drives

Jaehyung Kim, Jinuk Park, Sanghyun Park

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 2069 - 2074

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

State-of-the-art storage devices that have parallel capability have significantly reduced the performance gap between processor and storage I/O. However, the internal parallelism makes it difficult to measure utilization that can be used as a basis of load balancing, which is a critical feature of performance improvement of parallel systems. When utilization of storage reaches to one hundred percent,...

chapter

Associative methods of fuzzy operations implementation

M.M. Zernov, V.V. Mladov

2017 Second Russia and Pacific Conference on Computer Technology and Applications (RPC) > 199 - 204

2017 Second Russian-Pacific Conference on Computer Technology and Applications (RPC)

This article describes the methods of fuzzy operations implementation based on the model of 3D associative information storage and processing device. The offered methods differ by binary matrices comparison application basing on masked associative comparison with shift by rows.

chapter

An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

Ajai V. George, Sankar Manoj, Sanket Rajan Gupte, Santonu Sarkar

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 233 - 242

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a...

chapter

CFStore: Boosting Hybrid storage performance by device crossfire

Wei Zhou, Dan Feng, Zhipeng Tan

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 99 - 106

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Hybrid storage is widely implemented as it satisfies the requirements of capacity and performance in an economically viable fashion. With the fast technical improvement, Hybrid storage systems consisting of several types of SSDs will be adopted gradually. Existing works mostly concentrate on thoroughly utilizing high-performance device but neglect the capability of low-performance device. This paper...

chapter

NSIM-ACE: An interconnection network simulator for evaluating remote direct memory access

Ryutaro Susukita, Yoshiyuki Morie, Takeshi Nanri, Hidetomo Shibamura

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH) > 1 - 8

2016 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH)

Network simulation is an important technique for designing interconnection networks and communication libraries. Also network simulations are useful for the analysis of internal communication behavior in parallel applications. This paper introduces a new interconnection network simulator NSIM-ACE. This simulator enables us to evaluate RDMA directly while existing simulators do not have such capability...

chapter

Tiles-and WPP-based HEVC Decoding on Asymmetric Multi-core Processors

Rafael Rodriguez-Sanchez, Enrique S. Quintana-Orti

2017 IEEE Third International Conference on Multimedia Big Data (BigMM) > 299 - 302

2017 IEEE Third International Conference on Multimedia Big Data (BigMM)

Low-power asymmetric multi-core processors (AMPs) are nowadays present in a wide variety of mobile and hand-held devices, and have attracted a lot of attention due to their appealing energy efficiency. However, these processors contain cores with different performance capabilities asking for solutions specifically tailored to exploit all their potential. In this paper, we provide two architecture-aware...

chapter

Towards exascale computing with heterogeneous architectures

Kenneth O'Brien, Lorenzo Di Tucci, Gianluca Durelli, Michaela Blott

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 398 - 403

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The goal of reaching exascale computing is made especially challenging by the highly heterogeneous nature of modern platforms and the energy they consume. As compute nodes typically utilize multiple multi-core CPU and are increasingly equipped with PCIe based accelerators, both are contributing to an ever more dynamic power consumption. In our study we evaluate our target application on a variety...

chapter

Clairvoyance: Look-ahead compile-time scheduling

Kim-Anh Tran, Trevor E. Carlson, Konstantinos Koukos, Magnus Sjalander, more

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 171 - 184

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

To enhance the performance of memory-bound applications, hardware designs have been developed to hide memory latency, such as the out-of-order (OoO) execution engine, at the price of increased energy consumption. Contemporary processor cores span a wide range of performance and energy efficiency options: from fast and power-hungry OoO processors to efficient, but slower in-order processors. The more...

chapter

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Xulong Tang, Ashutosh Pattnaik, Huaipan Jiang, Onur Kayiran, more

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 649 - 660

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address...

chapter

Parallel Implementation of ECC Point Multiplication on a Homogeneous Multi-Core Microcontroller

M.S. Albahri, Mohammed Benaissa, Zia Uddin Ahamed Khan

2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN) > 386 - 389

2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN)

In this paper, we propose a novel parallel Elliptic-Curve-Cryptography (ECC) point multiplication implementation over binary Galois Field, GF(2m) by exploiting the advantage of concurrent operation in a homogeneous multi-core processor to yield an improved performance. A modified Lopez-Dahab (LD) mix-coordinates point multiplication algorithm is developed that exploits concurrency and enables operation...

chapter

An Experimental Study of Redundant Array of Independent SSDs and Filesystems

Yuxuan Xing, Ya Feng, Songping Yu, Zhengguo Chen, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 42 - 49

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Solid state disks (SSDs) become more and more popular in personal devices and data centers. Flash chips can be packaged in Hard disk drive (HDD) form factors and provide the same interface as HDDs. This character makes SSDs easily replace HDDs in existing storage systems. PCIe-based SSD can provide a higher I/O performance, but it is still a little expensive. This paper studies the feasibility of...

chapter

Set-to-Set Disjoint Paths in Tori

Keiichi Kaneko, Antoine Bossard

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 91 - 97

2016 Fourth International Symposium on Computing and Networking (CANDAR)

Numerous TOP500 supercomputers are based on a torus interconnection network. The torus topology is effectively one of the most popular interconnection networks for massively parallel systems due to its interesting topological properties such as symmetry and simplicity. For instance, the world-famous supercomputers Fujitsu K, IBM Blue Gene/L, IBM Blue Gene/P and Cray XT3 are all torus-based. In this...

chapter

Performance Evaluation of Parallelizing Algorithm Using Spanning Tree for Stream-Based Computing

Guyue Wang, Koichi Wada, Shinichi Yamagiwa

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 497 - 503

2016 Fourth International Symposium on Computing and Networking (CANDAR)

This paper proposes a detailed performance evaluation of an algorithm using spanning tree that automatically exploits the parallelism and determines an execution order of multiple kernel programs in distributed environment. In stream-based computing, efficient parallel execution requires careful scheduling of the invocation of the kernel programs. By mapping a kernel to a node and an I/O stream between...

chapter

WebCL prototype for high performance browser running on Android-powered mobile device

Jae-Ho Lee, Hyun-Woo Cho, Chang-Hoon Jung, Dong-Hyun Kim, more

2016 International Conference on Information and Communication Technology Convergence (ICTC) > 1039 - 1041

2016 International Conference on Information and Communication Technology Convergence (ICTC)

WebCL is a browser version of the Khronos OpenCL standard. It allows a web browser to exploit GPU and CPU for parallel processing by embedding OpenCL kernel code into JavaScript code, which leads to significant speedups of compute-intensive applications such as physics and image processing. This paper presents a working prototype of WebCL-enabled browser that runs on Android-powered mobile devices,...

chapter

An n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxs

Carlos Goncalves, Joaquim F. Silva, Jose C. Cunha

2016 IEEE 12th International Conference on e-Science (e-Science) > 120 - 129

2016 IEEE 12th International Conference on e-Science (e-Science)

LocalMaxs extracts relevant multiword terms based on their cohesion but is computationally intensive, a critical issue for very large natural language corpora. The corpus properties concerning n-gram distribution determine the algorithm complexity and were empirically analyzed for corpora up to 982 million words. A parallel LocalMaxs implementation exhibits almost linear relative efficiency, speedup,...

chapter

Towards Distributed Mobile Computing

G. Massari, M. Zanella, W. Fornaciari

2016 Mobile System Technologies Workshop (MST) > 29 - 35

2016 Mobile System Technologies Workshop (MST)

In the latest years, we observed an exponential growth of the market of the mobile devices. In this scenario, it assumes a particular relevance the rate at which mobile devices are replaced. According to the International Telecommunicaton Union in fact, smart-phone owners replace their device every 20 months, on average. The side effect of this trend is to deal with the disposal of an increasing amount...

chapter

How Parallelization Helps Crowd Simulation: Study of an OpenMP-Based System

Edwin Lobo-Hernandez, Xun Luo, Gustavo Alomia-Penafiel, Nan Liu, more

2016 International Conference on Virtual Reality and Visualization (ICVRV) > 354 - 357

2016 International Conference on Virtual Reality and Visualization (ICVRV)

This paper analyzes the parallelization efficiency of Menge [1], an open source virtual crowd simulation system widely used for algorithm benchmarking, with focuses on three aspects: performance of the existing parallel processing scheme, bottleneck of parallel processing, and improvement opportunities for parallel efficiency of the system. First, we calculate the speedup ratio of each Menge module...

chapter

Basic k-mer operations using massive parallel processing on heterogeneus architectures

Nelson Enrique Vera-Parra, Cristian Alejandro Rojas-Quintero, Jose Nelson Perez-Castillo

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) > 193 - 196

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS)

In this article is presented and assessed a massive parallel processing model for basic operations with k-mers from genomic sequences, based on defined functions in terms of N-dimensional spaces. The model is implemented using a set of OpenCL cores available at github.com/bioinfud/k-merscl and assessed using a heterogeneous platform CPU/GPU and a dataset based on randomly generated k-mers. The results...

chapter

A kind of FTL Scheme Which Keeps the High Performance and Lowers the Capacity of RAM Occupied by Mapping Table

Yang Hu, Xiaoming Dong

2016 IEEE International Conference on Networking, Architecture and Storage (NAS) > 1 - 2

2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

Three-level page-mapping FTL scheme utilizes the characteristics of SSD hardware system, divides a plane into several parts called block-group. A block-group has a fixed number of physical blocks. In this scheme, a series of logical pages are stored in a block-group. Inside the block-group, the mapping relationship between logical page and physical page is fully associative. This scheme decreases...

chapter

Parallel-DFTL: A Flash Translation Layer That Exploits Internal Parallelism in Solid State Drives

Wei Xie, Yong Chen, Philip C. Roth

2016 IEEE International Conference on Networking, Architecture and Storage (NAS) > 1 - 10

2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

Solid State Drives (SSDs) using flash memory storage technology present a promising storage solution for data-intensive applications due to their low latency, high bandwidth, and low power consumption compared to traditional hard disk drives. SSDs achieve these desirable characteristics using internal parallelism - parallel access to multiple internal flash memory chips - and a Flash Translation Layer...

INFONA - science communication portal

Search results

Neural network for saturation prediction of solid state drives

Associative methods of fuzzy operations implementation

An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

CFStore: Boosting Hybrid storage performance by device crossfire

NSIM-ACE: An interconnection network simulator for evaluating remote direct memory access

Tiles-and WPP-based HEVC Decoding on Asymmetric Multi-core Processors

Towards exascale computing with heterogeneous architectures

Clairvoyance: Look-ahead compile-time scheduling

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Parallel Implementation of ECC Point Multiplication on a Homogeneous Multi-Core Microcontroller

An Experimental Study of Redundant Array of Independent SSDs and Filesystems

Set-to-Set Disjoint Paths in Tori

Performance Evaluation of Parallelizing Algorithm Using Spanning Tree for Stream-Based Computing

WebCL prototype for high performance browser running on Android-powered mobile device

An n-gram cache for large-scale parallel extraction of multiword relevant expressions with LocalMaxs

Towards Distributed Mobile Computing

How Parallelization Helps Crowd Simulation: Study of an OpenMP-Based System

Basic k-mer operations using massive parallel processing on heterogeneus architectures

A kind of FTL Scheme Which Keeps the High Performance and Lowers the Capacity of RAM Occupied by Mapping Table

Parallel-DFTL: A Flash Translation Layer That Exploits Internal Parallelism in Solid State Drives

Filter options

Publication date

Content availability

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options