The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
GPUs use thousands of threads to provide high performance and efficiency. In general, if one thread of a kernel uses one of the resources (compute, bandwidth, data cache) more heavily, there will be significant contention for that resource due to the large number of identical concurrent threads. This contention will eventually saturate the performance of the kernel due to contention for the bottleneck...
GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing...
GPU is often equipped with complex memory systems, including globalmemory, texture memory, shared memory, constant memory, and variouslevels of cache. Where to place the data is important for theperformance of a GPU program. However, the decision is difficult for aprogrammer to make because of architecture complexity and thesensitivity of suitable data placements to input and architecturechanges.This...
As many real-world data can elegantly be represented as graphs, various graph kernels and methods for computing them have been proposed. Surprisingly, many of the recent graph kernels do not employ the kernel trick anymore but rather compute an explicit feature map and report higher efficiency. So, is there really no benefit of the kernel trick when it comes to graphs? Triggered by this question,...
He growth of the online data provides the user a access to information on the Internet but also creates the challenges to obtain the valuable knowledge. In this paper we focus on news text classification, which is meaningful for information provider to organize and display the news but also for the users to reach the valuable information easily. A hierarchy method based on LDA and SVM is proposed...
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need...
Application-tailored networks are customized networks optimized for application requirements. They use custom protocol stacks and network virtualization to provide flexible and efficient communication. End user nodes run a framework called NENA to connect to such networks at runtime. The current NENA implementation runs on top of the operating system's network stack and uses the Socket API. It allows...
The Stochastic On-Time Arrival (SOTA) problem has recently been studied as an alternative to traditional shortest-path formulations in situations with hard deadlines. The goal is to find a routing strategy that maximizes the probability of reaching the destination within a pre-specified time budget, with the edge weights of the graph being random variables with arbitrary distributions. While this...
Epistatic interactions between genes are believed to be a critical component in the genetic architecture of complex diseases. Genome Wide Association Studies (GWAS) may be able to detect such genetic interactions indirectly, via the identification of associated SNP markers. Major obstacles to progress in this area are: the unknown nature of epistatic interactions, little understanding of the capabilities...
Power and energy have become dominant aspects of hardware and software design in the High Performance Computing (HPC). Recently, the Department of Defense (DOD) has put a constraint that applications and architectures need to attain 75 GFLOPS/Watt in order to support the future missions. This requires a significant research effort towards power and energy optimization. OpenMP programming model is...
GPUs have gained tremendous popularity in a broad range of application domains. These applications possess varying grains of parallelism and place high demands on compute resources -- many times imposing real-time constraints, requiring flexible work schedules, and relying on concurrent execution of multiple kernels on the device. These requirements present a number of challenges when targeting current...
In field of autonomous and intelligent vehicles, the goal of pedestrian classification is to reduce amount of accidents. The object classification accuracy depends on the type of classifier and the extracted object features used for classification. Support Vector Machines (SVM), is considered the most effective classifier for this task. However, it depends on a number of factors that require researchers...
The existing generative classifiers (eg. Naïve Bayes) estimate joint probability distribution p(x,y) or likelihood p(x|y) with the help of different density estimators, which are not suitable for large data sets due to their high time and space complexities. These classifiers also make different assumptions; allow limited dependencies among attributes and estimate one-dimensional likelihood. A new...
In order to address the trade-off between certification and resource efficiency, researchers are recently trying to apply a criticality mode change mechanism to mixed-criticality systems. However, the actual implementation of the criticality mode change has not been studied rigorously. In this paper, we suggest a practical design to implement the criticality mode change framework for Real-Time Operating...
Automation systems must primarily be deterministic and reliable, especially in safety-critical environments. With recent trends such as mass customization or Industry 4.0, there is an increasing need for automation systems to be dynamic. Changing parts of the software of today's automation systems, however, typically requires rebooting the controller, which makes software updates a complex and costly...
LIKWID is a set of performance-related command line tools targeting X86 processors. Besides affinity-related tools it also includes likwid-perfctr, which allows to count hardware performance events. LIKWID builds upon the Linux msr kernel module, which allows to access model-specific registers (MSRs) via a device file interface. In addition to a set of convenient functional features such as a logical...
While most microkernel-based systems implement non-essential software components as user space tasks and strictly separate those tasks during runtime, they often rely on a static configuration and composition of their software components to ensure safety and security. In this paper, we extend a microkernel-based system architecture with a Trusted Platform Module (TPM) and propose a verification mechanism...
Significant application performance improvements can be achieved by heterogeneous compute technologies, such as multi-core CPUs, GPUs and FPGAs. The HARNESS project is developing architectural principles that enable the next generation cloud platforms to incorporate such devices thereby vastly increasing performance, reducing energy consumption, and lowering associated cost profiles. Along with management...
Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...
Within this paper an adaptive approach for parallel simulation of SystemC RTL models on future many-core architectures like the Single-chip Cloud Computer (SCC) from Intel is presented. It is based on a configurable parallel SystemC kernel that preserves the partial order defined by the SystemC delta cycles while avoiding global synchronization as far as possible. The underlying algorithm relies on...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.