The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased...
The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the...
FPGA-enabled datacenters have shown great potential for providing performance and energy efficiency improvement, and captured a great amount of attention from both academia and industry. In this paper we aim to answer one key question: how can we efficiently integrate FPGAs into state-of-the-art big-data computing frameworks? Although very important, this problem has not been well studied, especially...
Many scientific applications, ranging from national security to medical advances, require solving a number of relatively small-size independent problems. As the size of each individual problem does not provide sufficient parallelism for the underlying hardware, especially accelerators, these problems must be solved concurrently as a batch in order to saturate the hardware with enough work, hence the...
Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...
As the complexity of applications continues to grow, each new generation of GPUs has been equipped with advanced architectural features and more resources to sustain its performance acceleration capability. Recent GPUs have been featured with concurrent kernel execution, which is designed to improve the resource utilization by executing multiple kernels simultaneously. However, prior systems only...
The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resources, which allow massively concurrent threads and extremely fast context switch. However, due to internal memory size constraints, there is a tradeoff between the per-thread register usage and the overall thread concurrency. This becomes a design problem in terms of performance tuning, since the performance...
Traditional suites used for benchmarking high-performance computing platforms or for architectural design space exploration use much simpler virtual memory layouts and multitasking/ multithreading schemes, which means that they cannot be used to study the complex interactions among the layers of the Android software stack. To demonstrate this, we present memory reference and concurrency data showing...
Prototyping and debugging of operating systems and drivers are very tough tasks because of hardware volatility, kernel panics, blue screens of death, long periods of time required to expose the bug, perturbation of the drivers by the debugger, and non-determinism of multi-threaded environment. This paper shows how the deterministic replay of the virtual machine execution can be used to reduce the...
In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous...
Effective use of the memory hierarchy is crucial to cloud computing. Platform memory subsystems must be carefully provisioned and configured to minimize overall cost and energy for cloud providers. For cloud subscribers, the diversity of available platforms complicates comparisons and the optimization of performance. To address these needs, we present X-Mem, a new open-source software tool that characterizes...
The Fast Fourier Transform (FFT) is an important algorithm in the fields of science and engineering, where it is used in diverse areas such as communications, signal processing, instrumentation, image and video analysis, etc. The algorithm is essentially a fast implementation of the Discrete Fourier Transform which allows it to reduce the asymptotic complexity of the latter from O(n2) to the former's...
Novel security detection technology are needed to meet the growing demand of subway operation. In this paper, an improved faulting detection algorithm for subway tunnel segment is proposed. A combined denoising technique is used to convert the depth image of faulting acquired by Kinect into binary image of the height difference which can be processed by digital image. The obvious advantage of this...
Privilege escalation attack is one of the serious threats to Linux. So the protection of the root user is an important requirement for Linux systems and SELinux has tackled this issue in some degree. But by exploiting kernel privilege-escalation vulnerabilities, the attackers can tamper security identifiers allocated for the process's security contexts, which are the foundation of SELinux enforcing...
Gedae has developed automated software engineering technology for computers and software. This paper presents the research, prototypes, and documented software engineering improvements from real-world case studies that led to the Gedae technology. Gedae's technology is based on the creation and analysis of software models, specifically dataflow software models. The dataflow software model is implemented...
Edge detection is one of the most important paradigm of Image processing. Images contain millions of pixel and each pixel information is independent of its neighbouring pixel. Hence this paper puts to test the capability of Graphics Processing Unit (GPU) to compute in parallel against the millions of pixel calculations involved in image processing. Each pixel operation is independent from other thus...
In recent years, the real-time diagnosis in the Ehealthis a widely used practice. Employing distributed computingsystems, it is possible to obtain excellent results, avoiding longdelays and invasive processes. However, the data processing stage, generally assigned on standard computational CPU environments, is a critical aspect, especially when the computational complexity of the numerical method...
Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...
Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...
Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.