The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today's leadership computing facilities have enabled the execution of transformative simulations at unprecedented scales. However, analyzing the huge amount of output from these simulations remains a challenge. Most analyses of this output is performed in post-processing mode at the end of the simulation. The time to read the output for the analysis can be significantly high due to poor I/O bandwidth,...
Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other...
The need for novel data analysis is urgent in the face of a data deluge from modern applications. Traditional approaches to data analysis incur significant data movement costs, moving data back and forth between the storage system and the processor. Emerging Active Flash devices enable processing on the flash, where the data already resides. An array of such Active Flash devices allows us to revisit...
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network...
In this paper, we consider the characteristics of the kernel adaptive filters for the mixture of linear and non-linear environments. We first consider employing a linear kernel as one of the kernels in multi-kernel adaptive filters. It is pointed out that the convergence characteristics of the filter corresponding to the linear kernel is affected by the selection of the other kernels. Then, we propose...
Today cache size and hierarchy level of caches play an important role in improving computer performance. By using full system simulations of gem5, the variation in memory bandwidth, system bus throughput, L1 and L2 cache size misses are measured by running SPLASH-2 Benchmarks on ARM and ALPHA Processors. In this work we calculate cache misses, memory bandwidth and system bus throughput by running...
In traffic video surveillance system, target-level tracking and feature-level tracking are two important areas for research. Therefore, the combination between them is an interesting question. Mean-shift is a traditional target-level tracking algorithm with no adaptation to vehicle scale and orientation change. In order to solve the problem, algorithm combine SURF (speed-up robust feature) feature...
The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design...
Stereo-matching algorithms recently received a lot of attention from the FPGA acceleration community. Presented solutions range from simple, very resource efficient systems with modest matching quality for small embedded systems to sophisticated algorithms with several processing steps, implemented on big FPGAs. In order to achieve high throughput, most implementations strongly focus on pipelining...
GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing...
Virtualization has become a central role in HPC Cloud due to easy management and low cost of computation and communication. Recently, Single Root I/O Virtualization (SR-IOV) technology has been introduced for high-performance interconnects such as InfiniBand and can attain near to native performance for inter-node communication. However, the SR-IOV scheme lacks locality aware communication support,...
We seek to improve the performance of sparse matrix computations on multicore processors with non-uniform memory access (NUMA). Typical implementations use a bandwidth reducing ordering of the matrix to increase locality of accesses with a compressed storage format to store and operate only on the non-zero values. We propose a new multilevel storage format and a companion ordering scheme as an explicit...
The Cloud data services, specifically, key/value stores and NoSQL database that require a large number of index lookups that fetch small amount of data. Random I/O becomes the critical performance factor. However, compared with sequential read, the efficiency of random read is very low. Our experiment will explain this. File I/O operation is closely associated with the implementation of I/O mechanism...
It is observed that covert channels can be easily implemented in TCP/IP stack. It is easily achieved by embedding the covert message in the various header fields seemingly filled with "Random" data such as TCP Sequence Number (SQN), IP Identification (ID) etc. Such manipulation of these fields which seems "random" at first sight but might be detected with the help of various techniques...
One of the greatest challenges in HPC is total system power and energy consumption. Whereas HPC interconnects have traditionally been designed with a focus on bandwidth and latency, there is an increasing interest in minimising the interconnect's energy consumption. This paper complements ongoing efforts related to power reduction and energy proportionality, by investigating the potential benefits...
GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized...
Organizations need knowledge of change, such as changes in customer purchasing behaviour, to adapt business strategies in response to changing circumstances. To understand what has changed, analysts have to be able to relate new knowledge acquired from a newer dataset to that acquired from an earlier dataset. This paper presents a method to detect changes in clustering structure over time. Discovering...
With the fast development of mobile technologies and wireless Internet access, video streaming share grows rapidly. New HEVC (High Efficiency Video Codec) standard was introduced by ITU-T in 2013. It increases compression rate and, in the same time, computational efforts. Thus development of efficient decoding systems for mobile platforms becomes actual problem. This paper is dedicated to analysis...
TCP Cubic is designed to better utilize high bandwidth-delay product paths in IP networks. It is currently the default TCP version in the Linux kernel. Our objective in this work is to better understand the performance of TCP Cubic in scenarios with a large number of competing long-lived TCP flows, as can be observed, e.g., in cloud environments. In such situations, Cubic connections tend to synchronize...
This paper presents a novel method for estimating parameters of financial models with jump diffusions. It is a Particle Filter based Maximum Likelihood Estimation process, which uses particle streams to enable efficient evaluation of constraints and weights. We also provide a CPU-FPGA collaborative design for parameter estimation of Stochastic Volatility with Correlated and Contemporaneous Jumps model...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.