Search results

chapter

CFWatcher: A novel target-based real-time approach to monitor critical files using VMI

Dongyang Zhan, Lin Ye, Binxing Fang, Xiaojiang Du, more

2016 IEEE International Conference on Communications (ICC) > 1 - 6

ICC 2016 - 2016 IEEE International Conference on Communications

Protecting critical files in file systems is very important to computer systems. To protect critical files, the VMI-based Real-time File-system Monitor tools are promising options. However, these tools are always operation-based and introduce high overhead. The operation-based approaches intercept some kind of file operation to monitor critical files. The selected file operation is intercepted by...

chapter

A phase shifting multiple filter design methodology for Lucy-Richardson deconvolution of log-mixtures complex RTN tail distribution

Hiroyuki Yamauchi, Worawit Somha

2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI) > 1 - 6

2015 28th Symposium on Integrated Circuits and Systems Design (SBCCI)

A multiple structure filter design methodology to improve convergence characteristics of the Lucy-Richardson-deconvolution (LRDec) is proposed. The deconvolution is required for decoupling the Random Telegraph Noise (RTN) tail effects from overall VLSI time-dependent operating margin characteristics. The proposed parallel filter design alleviates unwanted phase misalignment between the two distributions...

chapter

FLIC: Fast, lightweight checkpointing for mobile virtualization using NVRAM

Kan Zhong, Duo Liu, Liang Liang, Linbo Long, more

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1562 - 1567

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Checkpointing is a key enabler of hibernation, live migration and fault-tolerance for virtual machines (VMs) in mobile devices. However, checkpointing a VM is usually heavy-weight: the VM's entire memory needs to be dumped to storage, which induces a significant amount of (slow) I/O operations, degrading system performance and user experience. In this paper, we propose FLIC, a fast and lightweight...

chapter

X-Mem: A cross-platform and extensible memory characterization tool for the cloud

Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, more

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 263 - 273

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Effective use of the memory hierarchy is crucial to cloud computing. Platform memory subsystems must be carefully provisioned and configured to minimize overall cost and energy for cloud providers. For cloud subscribers, the diversity of available platforms complicates comparisons and the optimization of performance. To address these needs, we present X-Mem, a new open-source software tool that characterizes...

chapter

GPU Based Video Tracking System

Gundavarapu Mallikarjuna Rao, Ch. Mallikarjuna Rao

2016 IEEE Tenth International Conference on Semantic Computing (ICSC) > 170 - 171

2016 IEEE Tenth International Conference on Semantic Computing (ICSC)

Video Tracking is the challenging task for computing professionals. The performance of video tracking techniques is greatly affected by background detection and elimination process In our approach we have explored concurrent computational ability of GPGPU (General purpose graphic processing units) for addressing this problem. Guassian Mixture Model (GMM) with adaptive weighted kernels are used for...

chapter

Scheduling techniques for GPU architectures with processing-in-memory capabilities

Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, more

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 31 - 44

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Processing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and applications, where main memory bandwidth is a critical bottleneck, can benefit from the use of PIM. To this end, an application should be properly...

chapter

TEMP: Thread batch enabled memory partitioning for GPU

Mengjie Mao, Wujie Wen, Xiaoxiao Liu, Jingtong Hu, more

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of...

chapter

A frame buffer caching for fast launch of browsers

Kyusik Kim, Taeseok Kim

2016 International Conference on Information Networking (ICOIN) > 430 - 432

2016 International Conference on Information Networking (ICOIN)

A novel scheme for fast browser launch is presented. Our scheme caches the frame buffer data of launched browser by using non-volatile memories, and reuses the cached data when browser launches later. Through implementation, we show that our scheme significantly reduces the launch time of browser.

chapter

Optimizing convolutional neural network on DSP

Shyam Jagannathan, Mihir Mody, Manu Mathew

2016 IEEE International Conference on Consumer Electronics (ICCE) > 371 - 372

2016 IEEE International Conference on Consumer Electronics (ICCE)

Deep learning techniques like Convolutional Neural Networks (CNN) are getting traction for classification of objects (e.g. traffic signs, pedestrian, vehicles etc.) in Advanced Driver Assistance Systems (ADAS). Typical CNN based trained networks poses huge computational complexity in feed forward path during operation due to multiple layers and within layer operations like 2D convolution, spatial...

chapter

Optimize In-kernel swap memory by avoiding duplicate swap out pages

Srividya Desireddy, Dinakar Reddy Pathireddy

2016 International Conference on Microelectronics, Computing and Communications (MicroCom) > 1 - 4

2016 International Conference on Microelectronics, Computing and Communications (MicroCom)

On embedded devices the physical memory is a critical resource. RAM should be used very efficiently without affecting the performance of the device. In-kernel memory swapping is a Linux feature which creates RAM based swap area and provides a form of virtual memory compression. It increases performance by using a compressed block device in RAM for paging instead of disk. Since In-kernel memory swapping...

chapter

Transplantation of U-boot and Linux Kernel to OMAP-L138

Suvendu Kumar Dash, Vara Punit Ashokbhai, R. Sanmugasundaram, D. Srinivasan

2016 International Conference on Microelectronics, Computing and Communications (MicroCom) > 1 - 5

2016 International Conference on Microelectronics, Computing and Communications (MicroCom)

The OMAP — L138 DSP+ARM Processor is a dual core SoC developed by Texas Instrument. It has the features of high-geared, small size and power efficiency. It is broadly implemented to advance portable device. This paper emphasises on retrieving the Linux Kernel code, analysing in specific aspect about the modules supported by OMAP — L138 Processor, altered source code of the Linux Kernel as per the...

chapter

Efficient Nystrom method for low rank approximation and error analysis

Lokendra Singh Patel, Suman Sana, S.P. Ghrera

2015 Third International Conference on Image Information Processing (ICIIP) > 536 - 542

2015 Third International Conference on Image Information Processing (ICIIP)

Kernel methods suffer from the high time and space complexity because kernel methods having large kernel matrix for training data. So we have to speed up the kernel method. That problem is solved by the low rank approximation. In this paper, we compare the two sampling based low rank approximation techniques implement for the large kernel matrix. First one is standard Nystrom method and the second...

chapter

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Srividya Ramanathan, Gautam Hazari, Kanishka Lahiri, Francesco Spadini

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 204 - 213

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

The increasing adoption of GPUs as mainstream computing devices, coupled with the imminent availability of large high-bandwidth caches based on die-stacked memory makes it important to analyze and understand modern GPU compute applications from the perspective of their memory access and data reuse characteristics. This paper presents detailed workload characterization studies on four GPU compute applications...

chapter

Using type transformations to generate program variants for FPGA design space exploration

Syed Waqar Nabi, Wim Vanderbauwhede

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...

chapter

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Alfonso Rodrıguez, Juan Valverde, Eduardo de la Torre

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 7

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

ARTICo³ is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase...

chapter

Achieving energy-efficiency on MPSoCs: performance and power optimizations

Hongyuan Ding, Miaoqing Huang

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 7

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Increasing computation demands with limited power budget require more energy-efficient designs without performance degradation in embedded systems and mobile computing platforms. Reconfigurable computing is an alternative to optimize both performance and power consumption. However, due to the complexity of hardware design, implementing dedicated accelerators usually lacks flexibility and productivity...

chapter

Out of Memory Prevention Based on Memory Allocation Rate

Gaku Nakagawa, Hirotaka Kawata, Shuichi Oikawa

2015 Third International Symposium on Computing and Networking (CANDAR) > 566 - 570

2015 Third International Symposium on Computing and Networking (CANDAR)

The amount of free memory have a great influence on system stability because out of memory occurs performance degradation phenomena, unexpected process terminations and so on. Thus, It is an important administration task to design the memory utilization plan based on the characteristics of the processes. However, in sometimes, processes demand a large amount of main memory rapidly and unexpectedly...

chapter

Investigations into techniques to accelerate memory intensive GPGPU applications

Winnie Thomas, Rohin D. Daruwala

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Recent advancements in the architecture of Graphic Processing Unit (GPU), enables the acceleration of many general purpose applications. Even with high memory bandwidth, GPUs are still faced with the challenge of accelerating highly memory intensive applications. To overcome this challenge this paper investigates the impact of scaling up of the memory partitions and also scaling of frequency of the...

chapter

A comparative analysis of resource requirements for parallel applications in GPGPU

Winnie Thomas, Rohin D. Daruwala

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 6

TENCON 2015 - 2015 IEEE Region 10 Conference

The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...

chapter

Porting an Operating System on an ARM-based sensor platform

Patrice Abbie D. Legaspi, Kevin K. Khan, Kathleen D. Santiago, Don F. Sayson, more

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 3

TENCON 2015 - 2015 IEEE Region 10 Conference

A sensor platform is a station equipped with extensive sensor and communication systems, which provide space based detection and alert capabilities. It consists of low-power, embedded computing devices known as motes, which use sensors to collect measurements from the physical world and its inhabitants. In this paper, an ARM-based sensor platform running a Linux Operating System is designed and implemented...

INFONA - science communication portal

Search results

CFWatcher: A novel target-based real-time approach to monitor critical files using VMI

A phase shifting multiple filter design methodology for Lucy-Richardson deconvolution of log-mixtures complex RTN tail distribution

FLIC: Fast, lightweight checkpointing for mobile virtualization using NVRAM

X-Mem: A cross-platform and extensible memory characterization tool for the cloud

GPU Based Video Tracking System

Scheduling techniques for GPU architectures with processing-in-memory capabilities

TEMP: Thread batch enabled memory partitioning for GPU

A frame buffer caching for fast launch of browsers

Optimizing convolutional neural network on DSP

Optimize In-kernel swap memory by avoiding duplicate swap out pages

Transplantation of U-boot and Linux Kernel to OMAP-L138

Efficient Nystrom method for low rank approximation and error analysis

Characterizing Large Dataset GPU Compute Workloads Targeting Systems with Die-Stacked Memory

Using type transformations to generate program variants for FPGA design space exploration

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Achieving energy-efficiency on MPSoCs: performance and power optimizations

Out of Memory Prevention Based on Memory Allocation Rate

Investigations into techniques to accelerate memory intensive GPGPU applications

A comparative analysis of resource requirements for parallel applications in GPGPU

Porting an Operating System on an ARM-based sensor platform

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options