The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Protecting critical files in file systems is very important to computer systems. To protect critical files, the VMI-based Real-time File-system Monitor tools are promising options. However, these tools are always operation-based and introduce high overhead. The operation-based approaches intercept some kind of file operation to monitor critical files. The selected file operation is intercepted by...
A multiple structure filter design methodology to improve convergence characteristics of the Lucy-Richardson-deconvolution (LRDec) is proposed. The deconvolution is required for decoupling the Random Telegraph Noise (RTN) tail effects from overall VLSI time-dependent operating margin characteristics. The proposed parallel filter design alleviates unwanted phase misalignment between the two distributions...
Checkpointing is a key enabler of hibernation, live migration and fault-tolerance for virtual machines (VMs) in mobile devices. However, checkpointing a VM is usually heavy-weight: the VM's entire memory needs to be dumped to storage, which induces a significant amount of (slow) I/O operations, degrading system performance and user experience. In this paper, we propose FLIC, a fast and lightweight...
Effective use of the memory hierarchy is crucial to cloud computing. Platform memory subsystems must be carefully provisioned and configured to minimize overall cost and energy for cloud providers. For cloud subscribers, the diversity of available platforms complicates comparisons and the optimization of performance. To address these needs, we present X-Mem, a new open-source software tool that characterizes...
Video Tracking is the challenging task for computing professionals. The performance of video tracking techniques is greatly affected by background detection and elimination process In our approach we have explored concurrent computational ability of GPGPU (General purpose graphic processing units) for addressing this problem. Guassian Mixture Model (GMM) with adaptive weighted kernels are used for...
Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...
As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of...
A novel scheme for fast browser launch is presented. Our scheme caches the frame buffer data of launched browser by using non-volatile memories, and reuses the cached data when browser launches later. Through implementation, we show that our scheme significantly reduces the launch time of browser.
Deep learning techniques like Convolutional Neural Networks (CNN) are getting traction for classification of objects (e.g. traffic signs, pedestrian, vehicles etc.) in Advanced Driver Assistance Systems (ADAS). Typical CNN based trained networks poses huge computational complexity in feed forward path during operation due to multiple layers and within layer operations like 2D convolution, spatial...
On embedded devices the physical memory is a critical resource. RAM should be used very efficiently without affecting the performance of the device. In-kernel memory swapping is a Linux feature which creates RAM based swap area and provides a form of virtual memory compression. It increases performance by using a compressed block device in RAM for paging instead of disk. Since In-kernel memory swapping...
The OMAP — L138 DSP+ARM Processor is a dual core SoC developed by Texas Instrument. It has the features of high-geared, small size and power efficiency. It is broadly implemented to advance portable device. This paper emphasises on retrieving the Linux Kernel code, analysing in specific aspect about the modules supported by OMAP — L138 Processor, altered source code of the Linux Kernel as per the...
Kernel methods suffer from the high time and space complexity because kernel methods having large kernel matrix for training data. So we have to speed up the kernel method. That problem is solved by the low rank approximation. In this paper, we compare the two sampling based low rank approximation techniques implement for the large kernel matrix. First one is standard Nystrom method and the second...
The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...
We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...
ARTICo3 is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase...
Increasing computation demands with limited power budget require more energy-efficient designs without performance degradation in embedded systems and mobile computing platforms. Reconfigurable computing is an alternative to optimize both performance and power consumption. However, due to the complexity of hardware design, implementing dedicated accelerators usually lacks flexibility and productivity...
The amount of free memory have a great influence on system stability because out of memory occurs performance degradation phenomena, unexpected process terminations and so on. Thus, It is an important administration task to design the memory utilization plan based on the characteristics of the processes. However, in sometimes, processes demand a large amount of main memory rapidly and unexpectedly...
Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...
The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...
A sensor platform is a station equipped with extensive sensor and communication systems, which provide space based detection and alert capabilities. It consists of low-power, embedded computing devices known as motes, which use sensors to collect measurements from the physical world and its inhabitants. In this paper, an ARM-based sensor platform running a Linux Operating System is designed and implemented...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.