Search results

Items from 141 to 160 out of 273 results

1 ...
5
6
7
8
9
10
11

chapter

Implementation and Analysis of AES Encryption on GPU

Qinjian Li, Chengwen Zhong, Kaiyong Zhao, Xinxin Mei, more

2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems > 843 - 848

2012 IEEE 14th Int'l Conf. on High Performance Computing and Communication (HPCC) & 2012 IEEE 9th Int'l Conf. on Embedded Software and Systems (ICESS)

GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on on-chip shared memory and the granularity...

chapter

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

Haicheng Wu, Gregory Diamos, Jin Wang, Srihari Cadambi, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum > 2433 - 2442

2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Data warehousing applications represent an emergent application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high core count architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement...

chapter

On the role of burst buffers in leadership-class storage systems

Ning Liu, Jason Cope, Philip Carns, Christopher Carothers, more

12 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 11

2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)

The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage...

chapter

VoIPiggy: Implementation and evaluation of a mechanism to boost voice capacity in 802.11WLANs

Pablo Salvador, Francesco Gringoli, Vincenzo Mancuso, Pablo Serrano, more

2012 Proceedings IEEE INFOCOM > 2931 - 2935

IEEE INFOCOM 2012 - IEEE Conference on Computer Communications

Supporting voice traffic in existing WLANs results extremely inefficient, given the large overheads of the protocol operation and the need to prioritize this traffic over, e.g., bulky transfers. In this paper we propose a simple scheme to improve the efficiency of WLANs when voice traffic is present. The mechanism is based on piggybacking voice frames over the acknowledgments, which reduces both frame...

chapter

Simulation and Experimental Evaluation of Multipath Congestion Control Strategies

Thomas Dreibholz, Hakim Adhari, Martin Becke, Erwin P. Rathgeb

2012 26th International Conference on Advanced Information Networking and Applications Workshops > 1113 - 1118

2012 IEEE Workshops of International Conference on Advanced Information Networking and Applications (WAINA)

The need for service resilience is leading to a steadily growing number of multi-homed Internet sites. In consequence, this results in a growing demand for utilising multiple Internet accesses simultaneously, in order to improve application payload throughput during normal operation. Multi-path Transport Layer protocol extensions - like Multi-Path TCP (MPTCP) for TCP and Concurrent Multipath Transfer...

chapter

Latency tolerance for Throughput Computing: Designer track

Chien-Ping Lu, Brian Ko

2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 524 - 525

2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads...

chapter

High-Performance Traffic Workload Architecture for Testing DPI Systems

Alysson Santos, Stenio Fernandes, Rafael Antonello, Geza Szabo, more

2011 IEEE Global Telecommunications Conference - GLOBECOM 2011 > 1 - 5

GLOBECOM 2011 - 2011 IEEE Global Communications Conference

Traffic identification and classification are essential tasks performed by Internet Service Provider (ISPs) administrators. Deep Packet Inspection (DPI) is currently playing a key role in traffic identification and classification due to its increased expressive power. To allow fair comparison among different DPI techniques and systems, workload generators should have the following characteristics:...

chapter

A performance study on operator-based stream processing systems

Miyuru Dayarathna, Souhei Takeno, Toyotaro Suzumura

2011 IEEE International Symposium on Workload Characterization (IISWC) > 79

2011 IEEE International Symposium on Workload Characterization (IISWC)

This short paper compares and contrasts performance characteristics of System S and S4, two stream processing systems which use operator-based programming model. Our aim is to investigate and characterize which architecture is better for handling which type of stream processing workloads and observe the reasons for such characteristics.

chapter

The implementation and application of user-space RSS technology in traffic monitoring system

Shu Li, Fan Yang, Yinan Dou, Zhenming Lei

2011 4th IEEE International Conference on Broadband Network and Multimedia Technology > 359 - 364

2011 4th IEEE International Conference on Broadband Network & Multimedia Technology (IC-BNMT 2011)

Microsoft's Receive-side scaling (RSS) is a network driver layer technology that enables the efficient distribution of received packets. However, Microsoft's RSS technology is implemented in hardware. In this paper, we implement RSS technology in user-space and apply it in a traffic monitoring system which runs on the DELL R710 multi-core server. We distribute received packets, according to certain...

chapter

Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos

Hong Chinh Doan, Haris Javaid, Sri Parameswaran

2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia > 56 - 65

2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia (ESTIMedia)

Parallel implementations of motion estimation for high definition videos typically exploit various forms of parallelism (GOP, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel implementations deliver real-time throughput, they often suffer from limited flexibility and scalability due to the form of parallelism and architecture used. In this work, we use Group...

chapter

CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra

Chen Feng, Yongqiang Zou, Zhiwei Xu

2011 Seventh International Conference on Semantics, Knowledge and Grids > 130 - 136

2011 Seventh International Conference on Semantics Knowledge and Grid (SKG)

Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based...

chapter

Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems?

Vilobh Meshram, Xavier Besseron, Xiangyong Ouyang, Raghunath Rajachandrasekar, more

2011 IEEE International Conference on Cluster Computing > 484 - 493

2011 IEEE International Conference on Cluster Computing (CLUSTER)

The demand for scalable I/O continues to grow rapidly as computer clusters keep growing. Much of the research in storage systems has been focused on improving the scale and performance of I/O throughput. Scalable file systems do a good job of scaling large file access bandwidth by striping or sharing I/O resources across many servers or disks. However, the same cannot be said about scaling file metadata...

chapter

Experience on Comparison of Operating Systems Scalability on the Multi-core Architecture

Yan Cui, Yingxin Wang, Yu Chen, Yuanchun Shi

2011 IEEE International Conference on Cluster Computing > 205 - 215

2011 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-core processor architectures have become ubiquitous in today's computing platforms, especially in parallel computing installations, with their power and cost advantages. While the technology trend continues towards having hundreds of cores on a chip in the foreseeable future, an urgent question posed to system designers as well as application users is whether applications can receive sufficient...

chapter

A kernel interleaved scheduling method for streaming applications on soft-core vector processors

Chengwei Zheng, John McAllister, Yun Wu

2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation > 278 - 285

2011 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XI)

Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures...

chapter

Analysis and benchmarking performance of Real Time Patch Linux and Xenomai in serving a real time application

Mastura Diana Marieska, Achmad Imam Kistijantoro, Muhammad Subair

Proceedings of the 2011 International Conference on Electrical Engineering and Informatics > 1 - 6

2011 International Conference on Electrical Engineering and Informatics (ICEEI)

Every Real Time Operating System (RTOS) has different characteristics. Testing is needed to determine which criteria of real time application is suitable to be implemented using an RTOS. In this research, benchmarking is performed on two Linux based RTOS; Real Time Patch Linux and Xenomai. Benchmarking is done by running encryption application on each RTOS. RTOS performance assessed through encryption...

chapter

Hybrid Co-scheduling Optimizations for Concurrent Applications in Virtualized Environments

Yulong Yu, Yuxin Wang, He Guo, Xubin He

2011 IEEE Sixth International Conference on Networking, Architecture, and Storage > 20 - 29

2011 6th IEEE International Conference on Networking, Architecture, and Storage (NAS)

Concurrent applications in virtualized environments (VE) encounter synchronization problems such as Lock Holder Preemption (LHP). Hybrid co-scheduling is an effective approach to address such problems. However, the contention and exclusiveness between multiple concurrent domains in hybrid co-scheduling cause a serious performance degradation and unfairness. To keep the benefits brought by hybrid co-scheduling...

chapter

Using Eager Strategies to Improve NFS I/O Performance

Stephen Rago, Aniruddha Bohra, Cristian Ungureanu

2011 IEEE Sixth International Conference on Networking, Architecture, and Storage > 258 - 267

2011 6th IEEE International Conference on Networking, Architecture, and Storage (NAS)

Typical NFS clients write in a lazy fashion: they leave dirty pages in the page cache and defer writing to the server until later. This reduces network traffic when applications repeatedly modify the same set of pages. However, this approach can lead to memory pressure, when the number of available pages on the client system is so low that the system must work harder to reclaim dirty pages. System...

chapter

Throughput-precision computation for generic matrix multiplication: Toward a computation channel for high-performance digital signal processing

Davide Anastasia, Yiannis Andreopoulos

2011 17th International Conference on Digital Signal Processing (DSP) > 1 - 6

2011 17th International Conference on Digital Signal Processing (DSP)

The generic matrix multiply (GEMM) subprogram is the core element of high-performance linear algebra software used in computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the precision of computation. Our technique employs DSP methods (such as scalar companding and rounding), followed by a new form of tight...

chapter

GPU optimized computation of stencil based algorithms

L.M. Itu, C. Suciu, F. Moldoveanu, A. Postelnicu, more

2011 RoEduNet International Conference 10th Edition: Networking in Education and Research > 1 - 6

2011 RoEduNet International Conference 10th Edition: Networking in Education and Research

The paper describes an optimized GPU based approach for stencil based algorithms. The simulations have been performed for a two dimensional steady state heat conduction problem, which has been solved through the red black point successive over relaxation method. Two kernels have been developed and their performance has been greatly improved through coalesced memory accesses and special shared memory...

chapter

Near real-time Fast Bilateral Stereo on the GPU

Stefano Mattoccia, Marco Viti, Florian Ries

CVPR 2011 WORKSHOPS > 136 - 143

2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops)

State of the art local stereo correspondence algorithms that adapt their supports to image content allow to infer very accurate disparity maps often comparable to algorithms based on global disparity optimization methods. However, despite their effectiveness, accurate local approaches based on this methodology are also computationally expensive and several simplifications aimed at reducing their computational...

1 ...
5
6
7
8
9
10
11

Keywords:
KERNEL
THROUGHPUT

Publication date

Set your own date range

Content availability

Available (269)
None (4)

Keywords

LINUX (71)
HARDWARE (45)
PROTOCOLS (41)
SERVERS (38)
GRAPHICS PROCESSING UNITS (37)
COMPUTER ARCHITECTURE (35)
INSTRUCTION SETS (34)
BANDWIDTH (31)
PERFORMANCE EVALUATION (29)
FIELD PROGRAMMABLE GATE ARRAYS (28)
IP NETWORKS (26)
TRANSPORT PROTOCOLS (24)
GPU (21)
OPTIMIZATION (20)
BENCHMARK TESTING (19)
RECEIVERS (19)
PARALLEL PROCESSING (18)
DELAY (17)
GRAPHICS PROCESSING UNIT (17)
SWITCHES (17)
RANDOM ACCESS MEMORY (16)
CUDA (15)
DATA MINING (14)
MEMORY MANAGEMENT (14)
PIPELINES (14)
SOCKETS (13)
ENCODING (12)
GPGPU (12)
PROGRAM PROCESSORS (12)
SCHEDULING (12)
TCP (12)
VIRTUAL MACHINING (12)
ENGINES (11)
LOCAL AREA NETWORKS (11)
PERFORMANCE (11)
ALGORITHM DESIGN AND ANALYSIS (10)
ARRAYS (10)
CONTEXT (10)
CRYPTOGRAPHY (10)
DECODING (10)
FPGA (10)
INTERNET (10)
MONITORING (10)
SYNCHRONIZATION (10)
VIRTUAL MACHINES (10)
CLOUD COMPUTING (9)
COPROCESSORS (9)
DELAYS (9)
MULTIPROCESSING SYSTEMS (9)
RESOURCE MANAGEMENT (9)
SCALABILITY (9)
SCHEDULES (9)
YARN (9)
DRIVER CIRCUITS (8)
TELECOMMUNICATION CONGESTION CONTROL (8)
OPERATING SYSTEM KERNELS (7)
PIPELINE PROCESSING (7)
REAL TIME SYSTEMS (7)
REGISTERS (7)
CLOCKS (6)
COMPUTATIONAL MODELING (6)
CONGESTION CONTROL (6)
CONVOLUTION (6)
DIGITAL SIGNAL PROCESSING (6)
LINUX KERNEL (6)
MEASUREMENT (6)
OPTIMISATION (6)
PROGRAMMING (6)
RESOURCE ALLOCATION (6)
STREAMING MEDIA (6)
TELECOMMUNICATION TRAFFIC (6)
WIRELESS LAN (6)
CACHE STORAGE (5)
COMPUTER GRAPHIC EQUIPMENT (5)
CONTAINERS (5)
DEGRADATION (5)
DETECTORS (5)
EMBEDDED SYSTEMS (5)
ETHERNET NETWORKS (5)
MESSAGE SYSTEMS (5)
MICROPROCESSOR CHIPS (5)
MULTI-THREADING (5)
NETWORK INTERFACES (5)
OPENCL (5)
PREFETCHING (5)
PROCESSOR SCHEDULING (5)
QUALITY OF SERVICE (5)
ROUTING (5)
SHARED MEMORY (5)
SYSTEM-ON-CHIP (5)
TIME FACTORS (5)
VIRTUALIZATION (5)
WRITING (5)
ACCELERATION (4)
BUFFER STORAGE (4)
COMPLEXITY THEORY (4)
DATABASES (4)
EMULATION (4)
more

INFONA - science communication portal

Search results

Implementation and Analysis of AES Encryption on GPU

Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

On the role of burst buffers in leadership-class storage systems

VoIPiggy: Implementation and evaluation of a mechanism to boost voice capacity in 802.11WLANs

Simulation and Experimental Evaluation of Multipath Congestion Control Strategies

Latency tolerance for Throughput Computing: Designer track

High-Performance Traffic Workload Architecture for Testing DPI Systems

A performance study on operator-based stream processing systems

The implementation and application of user-space RSS technology in traffic monitoring system

Multi-ASIP based parallel and scalable implementation of motion estimation kernel for high definition videos

CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra

Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems?

Experience on Comparison of Operating Systems Scalability on the Multi-core Architecture

A kernel interleaved scheduling method for streaming applications on soft-core vector processors

Analysis and benchmarking performance of Real Time Patch Linux and Xenomai in serving a real time application

Hybrid Co-scheduling Optimizations for Concurrent Applications in Virtualized Environments

Using Eager Strategies to Improve NFS I/O Performance

Throughput-precision computation for generic matrix multiplication: Toward a computation channel for high-performance digital signal processing

GPU optimized computation of stencil based algorithms

Near real-time Fast Bilateral Stereo on the GPU

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options