The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With FPGAs emerging as a promising accelerator for general-purpose computing, there is a strong demand to make them accessible to software developers. Recent advances in OpenCL compilers for FPGAs pave the way for synthesizing FPGA hardware from OpenCL kernel code. To enable broader adoption of this paradigm, significant challenges remain. This paper presents our efforts in developing dynamic profiling...
Big Data analytics and new problems in social networks, computational biology, and web connectivity led to a renewed research interest in graph processing. Due to "irregularity" of graph computations, efficient parallel graph processing faces a set of software and hardware challenges debated in literature. In this paper, by utilizing hardware performance counters, we characterize system...
Modern embedded systems are typically implemented using both programmable processors and application specific hardware in order to meet real time design goals, besides other metrics, such as, performance, area and cost. The availability of programmable processors and application specific hardware enables an application architect to partition the execution of the given application code (specified in...
As power becomes one of the most important re-sources to provision while building modern HPC systems and applications, it becomes crucial to obtain deeper insights into applications' power and thermal characteristics. There exists aneed to correlate application context with processor-level andsystem-level power and thermal measurements. Existing profilingtools to monitor power and thermal measurements...
Embedded system design involves meeting strict design goals such as performance, area and power consumption. In-order to meet these design goals embedded systems are implemented in programmable processors and application-specific hardware. Hardware/Software partitioning is thus, a critical step in the realization of embedded systems. The initial software description of the application is profiled...
Today on-chip monitoring solutions should be characterized by a reduced software and hardware overheads. So, this work deals with techniques to profile computational behavior and communication patterns of hardware/software components belonging to systems with multiple processing elements, i.e. a more general representation of on-chip embedded systems. In particular, the paper focuses on profiling...
The measurement of the electrical activity of the heart can be done with electrocardiogram (ECG). Automatic arrhythmia-diagnosis systems which results in high accuracy rates for inside and outside patient are still an important area of research. The accuracy of such system depends on accuracy of the classification system. All this classification system required qualitative features for classification...
Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources...
Energy efficiency has become a primary concern for data centers in recent years. Understanding where the energy has been spent within a software is fundamental for energy-efficiency study as a whole. In this paper, we take the first step towards this direction by building an energy profiling module on top of IgProf. IgProf is an application profiler developed at CERN for scientific computing workloads...
In the Dataflow model instructions are executed as soon as their input operands are ready, allowing the natural exploitation of instruction level parallelism (ILP), which makes it extremely useful for increasing applications' performance on multicore machines. However, the lack of accurate information on the parallel code can make it more difficult for programmers to perform code analysis and optimization...
LIKWID is a set of performance-related command line tools targeting X86 processors. Besides affinity-related tools it also includes likwid-perfctr, which allows to count hardware performance events. LIKWID builds upon the Linux msr kernel module, which allows to access model-specific registers (MSRs) via a device file interface. In addition to a set of convenient functional features such as a logical...
Profiling is of great assistance in understanding and optimizing applications' behavior. Today's profiling techniques help developers focus on the pieces of code leading to the highest penalties according to a given performance metric. In this paper we describe a pair of tools we have extended to complement the traditional algorithm-oriented analysis. Our extended tools provide new object-differentiated...
High Performance Computing systems expect applications to leverage the most of their processing power. This need is even more present for applications such as Monte Carlo simulations that require noteworthy CPU time and memory footprint. Optimizing applications is one approach to reduce the consumption of these resources. Before optimizing, it is mandatory to profile the application in order to pinpoint...
Usage of GPU-based architectures for scientific computing has been steadily increasing in the last years. This new paradigm for both programming and execution has been applied to solve several classic problems much faster than using the conventional multiprocessor and/or multicomputer approach. These architectures allow an increase in performance -- compared to conventional CPU processors -- for specific...
More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...
We achieve very small runtime overhead: approximately a 1.2-10 times slowdown and moderate memory consumption. We demonstrate the effectiveness of Parallel Prophet in eight benchmarks in the Omp SCR and NAS Parallel benchmarks by comparing our predictions with actual parallelized code. Our simple memory model also identifies performance limitations resulting from the memory system contention. We present...
This paper deals with the binary analysis of executable programs, with the goal of understanding how they access memory. It explains how to statically build a formal model of all memory accesses. Starting with a control-flow graph of each procedure, well-known techniques are used to structure this graph into a hierarchy of loops in all cases. The paper shows that much more information can be extracted...
Modern parallel performance measurement systems collect performance information either through probes inserted in the application code or via statistical sampling. Probe-based techniques measure performance metrics directly using calls to a measurement library that execute as part of the application. In contrast, sampling-based systems interrupt program execution to sample metrics for statistical...
With the growing needs for advanced functionalities in modern embedded systems, it is now necessary to integrate multiple processors in the system, preferably on a single chip, to support the required computing complexity. The problem is that such multiprocessor system-on-chip (MPSoC) architecture is very complex and its internal behavior is very difficult to track. An effective tool for profiling...
Monitoring circuitry is presented that extracts properties and features from a complex system based on a system-on-chip based device to support ICmetrics, a novel security concept that aims to uniquely identify and secure an embedded system based on its own behavioural identity. The circuits utilise a novel approach to profiling the instruction fetches and data accesses associated with each of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.