Search results

Items from 141 to 160 out of 506 results

1 ...
5
6
7
8
9
10
11

chapter

Optimal scheduling of in-situ analysis for large-scale scientific simulations

Preeti Malakar, Venkatram Vishwanath, Todd Munson, Christopher Knight, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Today's leadership computing facilities have enabled the execution of transformative simulations at unprecedented scales. However, analyzing the huge amount of output from these simulations remains a challenge. Most analyses of this output is performed in post-processing mode at the end of the simulation. The time to read the output for the analysis can be significantly high due to poor I/O bandwidth,...

chapter

PGX.D: a fast distributed graph processing engine

Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other...

chapter

AnalyzeThis: an analysis workflow-aware storage system

Hyogi Sim, Youngjae Kim, Sudharshan S. Vazhkudai, Devesh Tiwari, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

The need for novel data analysis is urgent in the face of a data deluge from modern applications. Traditional approaches to data analysis incur significant data movement costs, moving data back and forth between the storage system and the processor. Emerging Active Flash devices enable processing on the flash, where the data already resides. An array of such Active Flash devices allows us to revisit...

chapter

DaDianNao: A Machine-Learning Supercomputer

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, more

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 609 - 622

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network...

chapter

Consideration on the performance of kernel adaptive filters for the mixture of linear and non-linear environments

Kiyoshi Nishikawa, Felix Albu

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 7

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we consider the characteristics of the kernel adaptive filters for the mixture of linear and non-linear environments. We first consider employing a linear kernel as one of the kernels in multi-kernel adaptive filters. It is pointed out that the convergence characteristics of the filter corresponding to the linear kernel is affected by the selection of the other kernels. Then, we propose...

chapter

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator

B. Vikas, Basavaraj Talawar

2014 3rd International Conference on Eco-friendly Computing and Communication Systems > 5 - 8

2014 3rd International Conference on Eco-friendly Computing and Communication Systems (ICECCS)

Today cache size and hierarchy level of caches play an important role in improving computer performance. By using full system simulations of gem5, the variation in memory bandwidth, system bus throughput, L1 and L2 cache size misses are measured by running SPLASH-2 Benchmarks on ARM and ALPHA Processors. In this work we calculate cache misses, memory bandwidth and system bus throughput by running...

chapter

An intelligent vehicle tracking technology based on SURF feature and Mean-shift algorithm

Liu Yang, Wang Zhong-li, Cai Bai-gen

2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014) > 1224 - 1228

2014 IEEE International Conference on Robotics and Biomimetics (ROBIO)

In traffic video surveillance system, target-level tracking and feature-level tracking are two important areas for research. Therefore, the combination between them is an interesting question. Mean-shift is a traditional target-level tracking algorithm with no adaptation to vehicle scale and orientation change. In order to solve the problem, algorithm combine SURF (speed-up robust feature) feature...

chapter

Characterization of OpenCL on a scalable FPGA architecture

Shanyuan Gao, Jeremy Chritz

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 6

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design...

chapter

Kernel-centric acceleration of high accuracy stereo-matching

Tobias Kenter, Henning Schmitz, Christian Plessl

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14) > 1 - 8

2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Stereo-matching algorithms recently received a lot of attention from the FPGA acceleration community. Presented solutions range from simple, very resource efficient systems with modest matching quality for small embedded systems to sophisticated algorithms with several processing steps, implemented on big FPGAs. In order to achieve high throughput, most implementations strongly focus on pipelining...

chapter

Multi-GPU System Design with Memory Networks

Gwangsun Kim, Minseok Lee, Jiyun Jeong, John Kim

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 484 - 495

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing...

chapter

High performance MPI library over SR-IOV enabled infiniband clusters

Jie Zhang, Xiaoyi Lu, Jithin Jose, Mingzhe Li, more

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Virtualization has become a central role in HPC Cloud due to easy management and low cost of computation and communication. Recently, Single Root I/O Virtualization (SR-IOV) technology has been introduced for high-performance interconnects such as InfiniBand and can attain near to native performance for inter-node communication. However, the SR-IOV scheme lacks locality aware communication support,...

chapter

A multilevel compressed sparse row format for efficient sparse computations on multicore processors

Humayun Kabir, Joshua Dennis Booth, Padma Raghavan

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit...

chapter

Improving Random Read Performance of Glibc

Mei Wang, Yuanyuan Zhou, Feng Xiao, Qiuming Luo

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science > 78 - 82

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)

The Cloud data services, specifically, key/value stores and NoSQL database that require a large number of index lookups that fetch small amount of data. Random I/O becomes the critical performance factor. However, compared with sequential read, the efficiency of random read is very low. Our experiment will explain this. File I/O operation is closely associated with the implementation of I/O mechanism...

chapter

The Implementation of TCP Sequence Number Reference Model in Linux Kernel

Dhananjay M. Dakhane, Prashant R. Deshmukh

2014 International Conference on Computational Intelligence and Communication Networks > 444 - 447

2014 International Conference on Computational Intelligence and Communication Networks (CICN)

It is observed that covert channels can be easily implemented in TCP/IP stack. It is easily achieved by embedding the covert message in the various header fields seemingly filled with "Random" data such as TCP Sequence Number (SQN), IP Identification (ID) etc. Such manipulation of these fields which seems "random" at first sight but might be detected with the help of various techniques...

chapter

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression

Branimir Dickov, Miquel Pericas, Paul M. Carpenter, Nacho Navarro, more

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 73 - 80

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

One of the greatest challenges in HPC is total system power and energy consumption. Whereas HPC interconnects have traditionally been designed with a focus on bandwidth and latency, there is an increasing interest in minimising the interconnect's energy consumption. This paper complements ongoing efforts related to power reduction and energy proportionality, by investigating the potential benefits...

chapter

Microarchitectural performance characterization of irregular GPU kernels

Molly A. O'Neil, Martin Burtscher

2014 IEEE International Symposium on Workload Characterization (IISWC) > 130 - 139

2014 IEEE International Symposium on Workload Characterization (IISWC)

GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized...

chapter

Relative density estimation using Self-Organizing Maps

Denny

2014 International Conference on Advanced Computer Science and Information System > 233 - 238

2014 International Conference on Advanced Computer Science and Information Systems (ICACSIS)

Organizations need knowledge of change, such as changes in customer purchasing behaviour, to adapt business strategies in response to changing circumstances. To understand what has changed, analysts have to be able to relate new knowledge acquired from a newer dataset to that acquired from an earlier dataset. This paper presents a method to detect changes in clustering structure over time. Discovering...

chapter

Study of task scheduling for HEVC video decoder in heterogeneous computational environment

Roman Arzumanyan, Alexey Fartukov, Jeik Kim

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE) > 577 - 579

2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE)

With the fast development of mobile technologies and wireless Internet access, video streaming share grows rapidly. New HEVC (High Efficiency Video Codec) standard was introduced by ITU-T in 2013. It increases compression rate and, in the same time, computational efforts. Thus development of efficient decoding systems for mobile platforms becomes actual problem. This paper is dedicated to analysis...

chapter

Understanding synchronization in TCP Cubic

Sonia Belhareth, Dino Lopez-Pacheco, Lucile Sassatelli, Denis Collange, more

2014 26th International Teletraffic Congress (ITC) > 1 - 9

2014 26th International Teletraffic Congress (ITC)

TCP Cubic is designed to better utilize high bandwidth-delay product paths in IP networks. It is currently the default TCP version in the Linux kernel. Our objective in this work is to better understand the performance of TCP Cubic in scenarios with a large number of competing long-lived TCP flows, as can be observed, e.g., in cloud environments. In such situations, Cubic connections tend to synchronize...

chapter

Particle filtering-based Maximum Likelihood Estimation for financial parameter estimation

Jinzhe Yang, Binghuan Lin, Wayne Luk, Terence Nahar

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

This paper presents a novel method for estimating parameters of financial models with jump diffusions. It is a Particle Filter based Maximum Likelihood Estimation process, which uses particle streams to enable efficient evaluation of constraints and weights. We also provide a CPU-FPGA collaborative design for parameter estimation of Stochastic Volatility with Correlated and Contemporaneous Jumps model...

1 ...
5
6
7
8
9
10
11

Keywords:
KERNEL
BANDWIDTH

Publication date

Set your own date range

Content availability

Available (503)
None (3)

Keywords

ESTIMATION (80)
GRAPHICS PROCESSING UNITS (48)
LINUX (43)
COMPUTATIONAL MODELING (39)
OPTIMIZATION (39)
COMPUTER ARCHITECTURE (37)
MEMORY MANAGEMENT (37)
HARDWARE (35)
INSTRUCTION SETS (35)
SERVERS (32)
PERFORMANCE EVALUATION (31)
THROUGHPUT (31)
BENCHMARK TESTING (29)
CLUSTERING ALGORITHMS (28)
PROTOCOLS (28)
FIELD PROGRAMMABLE GATE ARRAYS (26)
RANDOM ACCESS MEMORY (25)
TRAINING (25)
GRAPHICS PROCESSING UNIT (24)
HISTOGRAMS (24)
GPU (23)
IMAGE SEGMENTATION (23)
KERNEL DENSITY ESTIMATION (23)
PARALLEL PROCESSING (23)
ALGORITHM DESIGN AND ANALYSIS (21)
MEAN SHIFT (20)
PIXEL (20)
SUPPORT VECTOR MACHINES (20)
DATA MODELS (19)
COPROCESSORS (18)
PROGRAM PROCESSORS (18)
REGRESSION ANALYSIS (18)
SMOOTHING METHODS (18)
FEATURE EXTRACTION (17)
IMAGE COLOR ANALYSIS (17)
REGISTERS (17)
ROBUSTNESS (17)
VECTORS (17)
DATA MINING (16)
DELAY (16)
LIBRARIES (16)
MATHEMATICAL MODEL (16)
MEASUREMENT (16)
MONITORING (16)
PROGRAMMING (16)
SHAPE (16)
IP NETWORKS (15)
NOISE (15)
PATTERN CLUSTERING (15)
SPARSE MATRICES (15)
TARGET TRACKING (15)
COMPUTER GRAPHIC EQUIPMENT (14)
DELAYS (14)
PROBABILITY DENSITY FUNCTION (14)
RECEIVERS (14)
ARRAYS (13)
CONVERGENCE (13)
MULTIPROCESSING SYSTEMS (13)
OBJECT DETECTION (13)
TRANSPORT PROTOCOLS (13)
ADAPTATION MODEL (12)
APPROXIMATION METHODS (12)
GPGPU (12)
POLYNOMIALS (12)
STANDARDS (12)
TRACKING (12)
MESSAGE PASSING (11)
MULTICORE PROCESSING (11)
OBJECT TRACKING (11)
OPERATING SYSTEM KERNELS (11)
QUALITY OF SERVICE (11)
VIRTUALIZATION (11)
COMPUTER VISION (10)
FPGA (10)
INTERNET (10)
ITERATIVE METHODS (10)
LEARNING (ARTIFICIAL INTELLIGENCE) (10)
MACHINE LEARNING (10)
RESOURCE MANAGEMENT (10)
SCHEDULING (10)
VIRTUAL MACHINING (10)
ACCURACY (9)
CLUSTERING (9)
CORRELATION (9)
CUDA (9)
EQUATIONS (9)
ESTIMATION THEORY (9)
GAUSSIAN PROCESSES (9)
MEDICAL IMAGE PROCESSING (9)
MEMORY BANDWIDTH (9)
OPENCL (9)
PROBABILITY (9)
SOCKETS (9)
STREAMING MEDIA (9)
SYNCHRONIZATION (9)
TCP (9)
TELECOMMUNICATION CONGESTION CONTROL (9)
ACCELERATION (8)
more

INFONA - science communication portal

Search results

Optimal scheduling of in-situ analysis for large-scale scientific simulations

PGX.D: a fast distributed graph processing engine

AnalyzeThis: an analysis workflow-aware storage system

DaDianNao: A Machine-Learning Supercomputer

Consideration on the performance of kernel adaptive filters for the mixture of linear and non-linear environments

On the Cache Behavior of SPLASH-2 Benchmarks on ARM and ALPHA Processors in Gem5 Full System Simulator

An intelligent vehicle tracking technology based on SURF feature and Mean-shift algorithm

Characterization of OpenCL on a scalable FPGA architecture

Kernel-centric acceleration of high accuracy stereo-matching

Multi-GPU System Design with Memory Networks

High performance MPI library over SR-IOV enabled infiniband clusters

A multilevel compressed sparse row format for efficient sparse computations on multicore processors

Improving Random Read Performance of Glibc

The Implementation of TCP Sequence Number Reference Model in Linux Kernel

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression

Microarchitectural performance characterization of irregular GPU kernels

Relative density estimation using Self-Organizing Maps

Study of task scheduling for HEVC video decoder in heterogeneous computational environment

Understanding synchronization in TCP Cubic

Particle filtering-based Maximum Likelihood Estimation for financial parameter estimation

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options