Search results

Items from 81 to 100 out of 273 results

chapter

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator

B. Vikas, Basavaraj Talawar

2014 3rd International Conference on Eco-friendly Computing and Communication Systems > 5 - 8

2014 3rd International Conference on Eco-friendly Computing and Communication Systems (ICECCS)

Today cache size and hierarchy level of caches play an important role in improving computer performance. By using full system simulations of gem5, the variation in memory bandwidth, system bus throughput, L1 and L2 cache size misses are measured by running SPLASH-2 Benchmarks on ARM and ALPHA Processors. In this work we calculate cache misses, memory bandwidth and system bus throughput by running...

chapter

Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Jeongseob Ahn, Chang Hyun Park, Jaehyuk Huh

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 394 - 405

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Although time-sharing CPUs has been an essential technique to virtualize CPUs for threads and virtual machines, most of the commercial operating systems and hyper visors maintain relatively coarse-grained time slices to mitigate the costs of context switching. However, the proliferation of system virtualization poses a new challenge for the coarse-grained time sharing techniques, since operating systems...

chapter

Characterization of OpenCL on a scalable FPGA architecture

Shanyuan Gao, Jeremy Chritz

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 6

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design...

chapter

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Yuan Wen, Zheng Wang, Michael F. P. O'Boyle

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need...

chapter

Multithreaded pipeline synthesis for data-parallel kernels

Mingxing Tan, Bin Liu, Steve Dai, Zhiru Zhang

2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 718 - 725

2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pipelining is an important technique in high-level synthesis, which overlaps the execution of successive loop iterations or threads to achieve high throughput for loop/function kernels. Since existing pipelining techniques typically enforce in-order thread execution, a variable-latency operation in one thread would block all subsequent threads, resulting in considerable performance degradation. In...

chapter

Performance Overhead of Xen on Linux 3.13 on ARM Cortex-A7

Xiaoli Gong, Qi Du, Xu Li, Jin Zhang, more

2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing > 453 - 456

2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC)

A number of simple performance measurements on disk speed, CPU, memory and network throughput were done on a dual ARM Cortex-A7 machine running Linux inside a Xen virtual machine that communicate with the outside through fronted-driver and backend-driver. The average performance overhead of Xen virtual machine is between 3 and 7 percent when the host is lightly loaded (running only the system software...

chapter

Compact Hash Tables for High-Performance Traffic Classification on Multi-core Processors

Yun R. Qu, Viktor K. Prasanna

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 17 - 24

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Traffic classification is one of the kernel applications in network management. Many Machine Learning (ML) traffic classification algorithms are based on decision-trees. While most of the existing implementations of decision-trees are hardwarebased, a new trend in network applications is to use softwarebased solutions. The decision-tree used for traffic classification is highly unbalanced, it is challenging...

chapter

Multi-path TCP: Boosting Fairness in Cellular Networks

Ashwin Sridharan, Rakesh K. Sinha, Rittwik Jana, Bo Han, more

2014 IEEE 22nd International Conference on Network Protocols > 275 - 280

2014 IEEE 22nd International Conference on Network Protocols (ICNP)

Cellular providers are rapidly deploying multiple technologies like cell biasing, carrier aggregation, co-ordinated interference control/scheduling to improve capacity and coverage. In this paper, we explore a complementary transport layer approach based on multipath TCP that can concurrently use multiple interfaces to boost throughput of users with poor coverage and improve fairness. Multipath TCP...

chapter

Performance characteristics of virtual switching

Paul Emmerich, Daniel Raumer, Florian Wohlfart, Georg Carle

2014 IEEE 3rd International Conference on Cloud Networking (CloudNet) > 120 - 125

2014 IEEE 3rd International Conference on Cloud Networking (CloudNet)

Virtual switches, like Open vSwitch, have emerged as an important part of cloud networking architectures. They connect interfaces of virtual machines and establish the connection to the outer network via physical network interface cards. Today, all important cloud frameworks support Open vSwitch as the default virtual switch. However, general understanding about the performance implications of Open...

chapter

Understanding synchronization in TCP Cubic

Sonia Belhareth, Dino Lopez-Pacheco, Lucile Sassatelli, Denis Collange, more

2014 26th International Teletraffic Congress (ITC) > 1 - 9

2014 26th International Teletraffic Congress (ITC)

TCP Cubic is designed to better utilize high bandwidth-delay product paths in IP networks. It is currently the default TCP version in the Linux kernel. Our objective in this work is to better understand the performance of TCP Cubic in scenarios with a large number of competing long-lived TCP flows, as can be observed, e.g., in cloud environments. In such situations, Cubic connections tend to synchronize...

chapter

Comparing soft and hard vector processing in FPGA-based embedded systems

Soh Jun Jie, Nachiket Kapre

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 7

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedded NEON engine or the Vectorblox MXP soft vector processor as appropriate. We consider computational...

chapter

System for dynamic configuration of TCP buffers based on operator

Vidhi Goel, Deep Shikha Aggarwal, Arun Nirwan

2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 2036 - 2040

2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

For any network connection, the data throughput of the end device is related to its TCP (Transmission Control Protocol) buffer size, network latency and network bandwidth. In regions where open market devices are popular like European and Asian markets, the devices come with a static value of buffer sizes that are independent of operator's network conditions consequently resulting in either low throughput...

chapter

The acceleration of turbo decoder on the newest GPGPU of Kepler architecture

Yang Zhang, Zuocheng Xing, Luechao Yuan, Cang Liu, more

2014 14th International Symposium on Communications and Information Technologies (ISCIT) > 199 - 203

2014 14th International Symposium on Communications and Information Technologies (ISCIT)

In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies...

chapter

Link-aware opportunistic D2D communications: Open source test-bed and experimental insights into their energy, capacity and QoS benefits

Alejandro Moraleda-Soler, Baldomero Coll-Perales, Javier Gozalvez

2014 11th International Symposium on Wireless Communications Systems (ISWCS) > 606 - 610

2014 11th International Symposium on Wireless Communications Systems (ISWCS)

Device-to-Device (D2D) communications can efficiently support the growth in mobile data traffic by offloading part of the traffic from the cellular infrastructure. D2D communications are influenced by the propagation conditions between mobile devices that depend on the antenna heights, presence of obstacles, and mobility of devices. Analytical and simulation studies have shown that link-aware opportunistic...

chapter

A Flexible and Scalable Affinity Lock for the Kernel

Benlong Zhang, Junbin Kang, Tianyu Wo, Yuda Wang, more

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 34 - 37

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

A number of NUMA-aware synchronization algorithms have been proposed lately to stress the scalability inefficiencies of existing locks. However their presupposed local lock granularity, a physical processor, is often not the optimum configuration for various workloads. This paper further explores the design space by taking into consideration the physical affinity between the cores within a single...

chapter

Adaptive partitioning strategies for loop parallelism in heterogeneous architectures

Angeles Navarro, Antonio Vilches, Rafael Asenjo, Francisco Corbera

2014 International Conference on High Performance Computing & Simulation (HPCS) > 120 - 128

2014 International Conference on High Performance Computing & Simulation (HPCS)

This paper explores the possibility of efficiently using multicores in conjunction with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous systems. The extension is based on a two-stages pipeline engine which is responsible for partitioning and scheduling the chunks...

chapter

A Performance Prediction Model for Memory-Intensive GPU Kernels

Zhidan Hu, Guangming Liu, Zhidan Hu

2014 IEEE Symposium on Computer Applications and Communications > 14 - 18

2014 IEEE Symposium on Computer Applications and Communications (SCAC)

Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the...

chapter

Jericho: Achieving scalability through optimal data placement on multicore systems

Stelios Mavridis, Yannis Sfakianakis, Anastasios Papagiannis, Manolis Marazakis, more

2014 30th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 10

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Achieving high I/O throughput on modern servers presents significant challenges. With increasing core counts, server memory architectures become less uniform, both in terms of latency as well as bandwidth. In particular, the bandwidth of the interconnect among NUMA nodes is limited compared to local memory bandwidth. Moreover, interconnect congestion and contention introduce additional latency on...

chapter

Tyche: An efficient Ethernet-based protocol for converged networked storage

Pilar Gonzalez-Ferez, Angelos Bilas

2014 30th Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 11

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

Current technology trends for efficient use of infrastructures dictate that storage converges with computation by placing storage devices, such as NVM-based cards and drives, in the servers themselves. With converged storage the role of the interconnect among servers becomes more important for achieving high I/O throughput. Given that Ethernet is emerging as the dominant technology for datacenters,...

chapter

Frequency table computation on dataflow architecture

P. Skoda, V. Sruk, B. Medved Rogina

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 342 - 346

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Frequency table computation is a key step in decision tree learning algorithms. In this paper we present a novel implementation targeted for dataflow architecture implemented on field programmable gate array (FPGA). Consistent with dataflow model of computation, the kernel views input dataset as synchronous streams of attributes and class values. The kernel was benchmarked using key functions from...

Keywords:
KERNEL
THROUGHPUT

Publication date

Set your own date range

Content availability

Available (269)
None (4)

Keywords

LINUX (71)
HARDWARE (45)
PROTOCOLS (41)
SERVERS (38)
GRAPHICS PROCESSING UNITS (37)
COMPUTER ARCHITECTURE (35)
INSTRUCTION SETS (34)
BANDWIDTH (31)
PERFORMANCE EVALUATION (29)
FIELD PROGRAMMABLE GATE ARRAYS (28)
IP NETWORKS (26)
TRANSPORT PROTOCOLS (24)
GPU (21)
OPTIMIZATION (20)
BENCHMARK TESTING (19)
RECEIVERS (19)
PARALLEL PROCESSING (18)
DELAY (17)
GRAPHICS PROCESSING UNIT (17)
SWITCHES (17)
RANDOM ACCESS MEMORY (16)
CUDA (15)
DATA MINING (14)
MEMORY MANAGEMENT (14)
PIPELINES (14)
SOCKETS (13)
ENCODING (12)
GPGPU (12)
PROGRAM PROCESSORS (12)
SCHEDULING (12)
TCP (12)
VIRTUAL MACHINING (12)
ENGINES (11)
LOCAL AREA NETWORKS (11)
PERFORMANCE (11)
ALGORITHM DESIGN AND ANALYSIS (10)
ARRAYS (10)
CONTEXT (10)
CRYPTOGRAPHY (10)
DECODING (10)
FPGA (10)
INTERNET (10)
MONITORING (10)
SYNCHRONIZATION (10)
VIRTUAL MACHINES (10)
CLOUD COMPUTING (9)
COPROCESSORS (9)
DELAYS (9)
MULTIPROCESSING SYSTEMS (9)
RESOURCE MANAGEMENT (9)
SCALABILITY (9)
SCHEDULES (9)
YARN (9)
DRIVER CIRCUITS (8)
TELECOMMUNICATION CONGESTION CONTROL (8)
OPERATING SYSTEM KERNELS (7)
PIPELINE PROCESSING (7)
REAL TIME SYSTEMS (7)
REGISTERS (7)
CLOCKS (6)
COMPUTATIONAL MODELING (6)
CONGESTION CONTROL (6)
CONVOLUTION (6)
DIGITAL SIGNAL PROCESSING (6)
LINUX KERNEL (6)
MEASUREMENT (6)
OPTIMISATION (6)
PROGRAMMING (6)
RESOURCE ALLOCATION (6)
STREAMING MEDIA (6)
TELECOMMUNICATION TRAFFIC (6)
WIRELESS LAN (6)
CACHE STORAGE (5)
COMPUTER GRAPHIC EQUIPMENT (5)
CONTAINERS (5)
DEGRADATION (5)
DETECTORS (5)
EMBEDDED SYSTEMS (5)
ETHERNET NETWORKS (5)
MESSAGE SYSTEMS (5)
MICROPROCESSOR CHIPS (5)
MULTI-THREADING (5)
NETWORK INTERFACES (5)
OPENCL (5)
PREFETCHING (5)
PROCESSOR SCHEDULING (5)
QUALITY OF SERVICE (5)
ROUTING (5)
SHARED MEMORY (5)
SYSTEM-ON-CHIP (5)
TIME FACTORS (5)
VIRTUALIZATION (5)
WRITING (5)
ACCELERATION (4)
BUFFER STORAGE (4)
COMPLEXITY THEORY (4)
DATABASES (4)
EMULATION (4)
more

INFONA - science communication portal

Search results

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator

Micro-Sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems

Characterization of OpenCL on a scalable FPGA architecture

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Multithreaded pipeline synthesis for data-parallel kernels

Performance Overhead of Xen on Linux 3.13 on ARM Cortex-A7

Compact Hash Tables for High-Performance Traffic Classification on Multi-core Processors

Multi-path TCP: Boosting Fairness in Cellular Networks

Performance characteristics of virtual switching

Understanding synchronization in TCP Cubic

Comparing soft and hard vector processing in FPGA-based embedded systems

System for dynamic configuration of TCP buffers based on operator

The acceleration of turbo decoder on the newest GPGPU of Kepler architecture

Link-aware opportunistic D2D communications: Open source test-bed and experimental insights into their energy, capacity and QoS benefits

A Flexible and Scalable Affinity Lock for the Kernel

Adaptive partitioning strategies for loop parallelism in heterogeneous architectures

A Performance Prediction Model for Memory-Intensive GPU Kernels

Jericho: Achieving scalability through optimal data placement on multicore systems

Tyche: An efficient Ethernet-based protocol for converged networked storage

Frequency table computation on dataflow architecture

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options