The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
GPU is continuing its trend of vastly outperforming CPU while becoming more general purpose. In order to improve the efficiency of AES algorithm, this paper proposed a CUDA implementation of Electronic Codebook (ECB) mode encoding process and Cipher Feedback (CBC) mode decoding process on GPU. In our implementation, the frequently accessed T-boxes were allocated on on-chip shared memory and the granularity...
Data warehousing applications represent an emergent application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high core count architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement...
The largest-scale high-performance (HPC) systems are stretching parallel file systems to their limits in terms of aggregate bandwidth and numbers of clients. To further sustain the scalability of these file systems, researchers and HPC storage architects are exploring various storage system designs. One proposed storage system design integrates a tier of solid-state burst buffers into the storage...
Supporting voice traffic in existing WLANs results extremely inefficient, given the large overheads of the protocol operation and the need to prioritize this traffic over, e.g., bulky transfers. In this paper we propose a simple scheme to improve the efficiency of WLANs when voice traffic is present. The mechanism is based on piggybacking voice frames over the acknowledgments, which reduces both frame...
The need for service resilience is leading to a steadily growing number of multi-homed Internet sites. In consequence, this results in a growing demand for utilising multiple Internet accesses simultaneously, in order to improve application payload throughput during normal operation. Multi-path Transport Layer protocol extensions - like Multi-Path TCP (MPTCP) for TCP and Concurrent Multipath Transfer...
In Throughput Computing, the data can be processed independently with a substantial amount of threads running similar programs, referred to as kernels, or shaders for graphics specific workload. A Throughput Computing device, such as GPU, requires task latency tolerance to hold the context of the outstanding threads, and data latency tolerance to hold spaces for memory requests issued from the threads...
Traffic identification and classification are essential tasks performed by Internet Service Provider (ISPs) administrators. Deep Packet Inspection (DPI) is currently playing a key role in traffic identification and classification due to its increased expressive power. To allow fair comparison among different DPI techniques and systems, workload generators should have the following characteristics:...
This short paper compares and contrasts performance characteristics of System S and S4, two stream processing systems which use operator-based programming model. Our aim is to investigate and characterize which architecture is better for handling which type of stream processing workloads and observe the reasons for such characteristics.
Microsoft's Receive-side scaling (RSS) is a network driver layer technology that enables the efficient distribution of received packets. However, Microsoft's RSS technology is implemented in hardware. In this paper, we implement RSS technology in user-space and apply it in a traffic monitoring system which runs on the DELL R710 multi-core server. We distribute received packets, according to certain...
Parallel implementations of motion estimation for high definition videos typically exploit various forms of parallelism (GOP, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel implementations deliver real-time throughput, they often suffer from limited flexibility and scalability due to the form of parallelism and architecture used. In this work, we use Group...
Multi-dimensional range queries are fundamental requirements in large scale Internet applications using Distributed Ordered Tables. Apache Cassandra is a Distributed Ordered Table when it employs order-preserving hashing as data partitioner. Cassandra supports multi-dimensional range queries with poor performance and with a limitation that there must be one dimension with an equal operator. Based...
The demand for scalable I/O continues to grow rapidly as computer clusters keep growing. Much of the research in storage systems has been focused on improving the scale and performance of I/O throughput. Scalable file systems do a good job of scaling large file access bandwidth by striping or sharing I/O resources across many servers or disks. However, the same cannot be said about scaling file metadata...
Multi-core processor architectures have become ubiquitous in today's computing platforms, especially in parallel computing installations, with their power and cost advantages. While the technology trend continues towards having hundreds of cores on a chip in the foreseeable future, an urgent question posed to system designers as well as application users is whether applications can receive sufficient...
Massively parallel networks of highly efficient, high performance Single Instruction Multiple Data (SIMD) processors have been shown to enable FPGA-based implementation of real-time signal processing applications with performance and cost comparable to dedicated hardware architectures. This is achieved by exploiting simple datapath units with deep processing pipelines. However, these architectures...
Every Real Time Operating System (RTOS) has different characteristics. Testing is needed to determine which criteria of real time application is suitable to be implemented using an RTOS. In this research, benchmarking is performed on two Linux based RTOS; Real Time Patch Linux and Xenomai. Benchmarking is done by running encryption application on each RTOS. RTOS performance assessed through encryption...
Concurrent applications in virtualized environments (VE) encounter synchronization problems such as Lock Holder Preemption (LHP). Hybrid co-scheduling is an effective approach to address such problems. However, the contention and exclusiveness between multiple concurrent domains in hybrid co-scheduling cause a serious performance degradation and unfairness. To keep the benefits brought by hybrid co-scheduling...
Typical NFS clients write in a lazy fashion: they leave dirty pages in the page cache and defer writing to the server until later. This reduces network traffic when applications repeatedly modify the same set of pages. However, this approach can lead to memory pressure, when the number of available pages on the client system is so low that the system must work harder to reclaim dirty pages. System...
The generic matrix multiply (GEMM) subprogram is the core element of high-performance linear algebra software used in computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the precision of computation. Our technique employs DSP methods (such as scalar companding and rounding), followed by a new form of tight...
The paper describes an optimized GPU based approach for stencil based algorithms. The simulations have been performed for a two dimensional steady state heat conduction problem, which has been solved through the red black point successive over relaxation method. Two kernels have been developed and their performance has been greatly improved through coalesced memory accesses and special shared memory...
State of the art local stereo correspondence algorithms that adapt their supports to image content allow to infer very accurate disparity maps often comparable to algorithms based on global disparity optimization methods. However, despite their effectiveness, accurate local approaches based on this methodology are also computationally expensive and several simplifications aimed at reducing their computational...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.