Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 80 results

chapter

Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench

Yingyi Luo, Xianshan Wen, Kazutomo Yoshii, Seda Ogrenci-Memik, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are becoming an attractive choice as a heterogeneous computing unit for scientific computing because FPGA vendors are adding floating-point-optimized architectures to their product lines. Additionally, high-level synthesis (HLS) tools such as Altera OpenCL SDK are emerging, which could potentially break the FPGA programming wall and provide a streamlined flow for domain experts in scientific...

chapter

AIScale — A coarse grained reconfigurable CNN hardware accelerator

Rastislav Struharik, Bogdan Vukobratovic

2017 IEEE East-West Design & Test Symposium (EWDTS) > 1 - 9

2017 IEEE East-West Design & Test Symposium (EWDTS)

In this paper we propose a novel CNN hardware accelerator, called AlScale, capable of accelerating convolutional, pooling, fully-connected and adding CNN layers. In contrast to most existing solutions, AIScale offers a complete solution to the full CNN acceleration. AIScale is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during the CNN layer processing...

chapter

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Patrick MacArthur

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 103 - 110

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...

chapter

Automatic Control Flow Generation for OpenVX Graphs

Merten Popp, Stef van Son, Orlando Moreira

2017 Euromicro Conference on Digital System Design (DSD) > 198 - 204

2017 Euromicro Conference on Digital System Design (DSD)

Heterogeneous platforms with large numbers of processing elements (PEs) have been proposed to satisfy the computational requirements of computer vision applications. Limiting the incurred communication cost here is key to meet the power constraints of embedded devices.We present a new heuristic to reduce communication among PEs and to external memory by aggregating inter-process communication and...

chapter

Hardwiring the OS kernel into a Java application processor

Chun-Jen Tsai, Cheng-Ju Lin, Cheng-Yang Chen, Yan-Hung Lin, more

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 53 - 60

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

This paper presents the design and implementation of a hardwired OS kernel circuitry inside a Java application processor to provide the system services that are traditionally implemented in software. The hardwired system functions in the proposed SoC include the thread manager, the memory manager, and the I/O subsystem interface. There are many advantages in making the OS kernel a hardware component,...

chapter

Introducing approximate memory support in Linux Kernel

Giulia Stazi, Francesco Menichelli, Antonio Mastrandrea, Mauro Olivieri

2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME) > 97 - 100

2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME)

This paper describes the implementation of approximate memory support in Linux operating system kernel. The new functionality allows the kernel to distinguish between normal memory banks, which are composed by standard memory cells that retain data without corruption, and approximate memory banks, where memory cells are subject to read/write faults with controlled probability. Approximate memories...

chapter

Memory fartitioning-based modulo scheduling for high-level synthesis

Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, more

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

High-Level Synthesis (HLS) has been widely recognized as an efficient compilation process targeting FPGAs for algorithm evaluation and product prototyping. However, the massively parallel memory access demands and the extremely expensive cost of single-bank memory with multi-port have impeded loop pipelining performance. Thus, based on an alternative multi-bank memory architecture, a joint approach...

article

Detecting and Preventing Kernel Rootkit Attacks with Bus Snooping

Hyungon Moon, Hojoon Lee, Ingoo Heo, Kihwan Kim, more

IEEE Transactions on Dependable and Secure Computing > 2017 > 14 > 2 > 145 - 157

To protect the integrity of operating system kernels, we present Vigilare system, a kernel integrity monitor that is architected to snoop the bus traffic of the host system from a separate independent hardware. This snoop-based monitoringenabled by the Vigilare system, overcomes the limitations of the snapshot-based monitoring employed in previous kernel integrity monitoring solutions. Being based...

chapter

Taming warp divergence

Jayvant Anantpur, R. Govindarajan

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 50 - 60

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Graphics Processing Units (GPUs) are designed to exploit large amount of parallelism. However, warp-level divergence occurring due to different amounts of work, memory access latency experienced, etc., results in warps of a thread block (TB) finishing kernel execution at different points in time. This, in effect, reduces utilization of resources of SMs and hence performance of the GPU. We propose...

chapter

Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA

auJuan Escobedo, auMingjie Lin

2016 International Conference on Field-Programmable Technology (FPT) > 125 - 132

2016 International Conference on Field-Programmable Technology (FPT)

For many intensive computing tasks, simultaneous data access into multi-dimensional data arrays is highly restricted by its data mapping strategy and memory port constraint. As such, to increase memory accessing bandwidth, innovative memory partitioning and mapping algorithms have been proposed to simultaneously access multiple memory blocks through physically distributing data elements in the same...

chapter

Zero and data reuse-aware fast convolution for deep neural networks on GPU

Hyunsun Park, Dongyoung Kim, Junwhan Ahn, Sungjoo Yoo

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 1 - 10

2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

Convolution operations dominate the total execution time of deep convolutional neural networks (CNNs). In this paper, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) CNNs often have abundant zero weights and (2) the performance benefit of Winograd convolution is limited mainly due...

chapter

Conducting reproducible research with Umbrella: Tracking, creating, and preserving execution environments

Haiyan Meng, Douglas Thain, Alexander Vyushkov, Matthias Wolf, more

2016 IEEE 12th International Conference on e-Science (e-Science) > 91 - 100

2016 IEEE 12th International Conference on e-Science (e-Science)

Publishing scientific results without the detailed execution environments describing how the results were collected makes it difficult or even impossible for the reader to reproduce the work. However, the configurations of the execution environments are too complex to be described easily by authors. To solve this problem, we propose a framework facilitating the conduct of reproducible research by...

chapter

Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA

Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, more

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Despite its popularity, deploying Convolutional Neural Networks (CNNs) on a portable system is still challenging due to large data volume, intensive computation and frequent memory access. Although previous FPGA acceleration schemes generated by high-level synthesis tools (i.e., HLS, OpenCL) have allowed for fast design optimization, hardware inefficiency still exists when allocating FPGA resources...

chapter

Soft2LM: Application Guided Heterogeneous Memory Management

Michael Giardino, Kshitij Doshi, Bonnie Ferri

2016 IEEE International Conference on Networking, Architecture and Storage (NAS) > 1 - 10

2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

This paper introduces a software policy for memory management in heterogeneous memory systems in order to improve the trade-offs between performance and power consumption, while attempting to make the best use of different characteristics of the underlying memory technologies. In this policy, the operating system and the application co-schedule page management in order to make informed decisions about...

chapter

COGITO: Code polymorphism to secure devices

Damien Courousse, Bruno Robisson, Jean-Louis Lanet, Thierno Barry, more

2014 11th International Conference on Security and Cryptography (SECRYPT) > 1 - 6

2014 11th International Conference on Security and Cryptography (SECRYPT)

In this paper, we advocate the use of code polymorphism as an efficient means to improve security at several levels in electronic devices. We analyse the threats that polymorphism could help thwart, and present the solution that we plan to demonstrate in the scope of a collaborative research project called COGITO. We expect our solution to be effective to improve security, to comply with the computing...

chapter

Transitioning Native Application into Virtual Machine by Using Hardware Virtualization Extensions

Muhammad Shams Ul Haq, Liao Lejian, Ma Lerong

2016 International Symposium on Computer, Consumer and Control (IS3C) > 397 - 403

2016 International Symposium on Computer, Consumer and Control (IS3C)

In presence of known and unknown vulnerabilities in code and flow control of programs, virtual machine alike isolation and sandboxing to confine maliciousness of process, by monitoring and controlling the behaviour of untrusted application, is an effective strategy. A confined malicious application cannot effect system resources and other applications running on same operating system. But present...

chapter

Adaptive cyber-physical systems with interpreted operating system kernels

Peter Troger, Christine Jakobs, Thomas Jakobs, Matthias Werner

2016 5th Mediterranean Conference on Embedded Computing (MECO) > 26 - 29

2016 5th Mediterranean Conference on Embedded Computing (MECO)

In the new era of cyber-physical systems, software must adapt itself to ever-changing environmental conditions and situations. This is currently not reflected in the design of embedded operating systems, since they are primarily optimized for fixed usage scenarios with tight resource constraints. We discuss the idea of interpreted operating system kernels, which can form a new foundation for highly...

chapter

Research on survivability strategic of operating system

Tong Wang, Liang Wang

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) > 1 - 5

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

The survivability of OS is very important for the whole system because OS is the base of information system or network system. Based on the analysis of resources, services and functions of the OS, this paper proposed the concept of a integrity running environment (IRE) owing to the particularity of the OS survivability, and then, puts forward the new definition, namely the OS survivability is that...

chapter

Effective Utilization of CUDA Hyper-Q for Improved Power and Performance Efficiency

Ryan S. Luley, Qinru Qiu

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1160 - 1169

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased...

chapter

Operating systems: Demand based modularity

Prabhat Kumar, Rohit Raj, Aman Reyaz, Pooshkar Rajiv

2016 International Conference on Computing, Communication and Automation (ICCCA) > 878 - 883

2016 International Conference on Computing, Communication and Automation (ICCCA)

Traditional PC based operating systems load most of its components during the boot process along with the kernel. This mechanism though effective for a broader objective, is seldom utilized fully by a majority of users as they usually perform a specific job which does not require every component of OS. It has been observed that operating systems which are designed keeping in mind the nature of job,...

Content availability:
Available
Keywords:
HARDWARE
MEMORY MANAGEMENT
KERNEL

Publication date

Set your own date range

Publication type

book (73)
article (7)

Keywords

LINUX (15)
INSTRUCTION SETS (11)
RANDOM ACCESS MEMORY (11)
FIELD PROGRAMMABLE GATE ARRAYS (10)
RESOURCE MANAGEMENT (10)
OPERATING SYSTEM (9)
SECURITY (9)
GRAPHICS PROCESSING UNITS (8)
OPERATING SYSTEM KERNELS (8)
OPERATING SYSTEMS (COMPUTERS) (7)
SERVERS (7)
GPU (6)
REGISTERS (6)
LIBRARIES (5)
OPERATING SYSTEMS (5)
OPTIMIZATION (5)
PROGRAM PROCESSORS (5)
ACCELERATION (4)
BANDWIDTH (4)
EMBEDDED SYSTEMS (4)
FPGA (4)
MULTIPROCESSING SYSTEMS (4)
OPENCL (4)
PROGRAMMING (4)
REAL TIME SYSTEMS (4)
RUNTIME (4)
SOFTWARE (4)
SYNCHRONIZATION (4)
VIRTUAL MACHINES (4)
ALGORITHM DESIGN AND ANALYSIS (3)
COMPUTER ARCHITECTURE (3)
DATA STRUCTURES (3)
DRIVER CIRCUITS (3)
GRAPHICS PROCESSING UNIT (3)
MICROKERNEL (3)
PARALLEL PROCESSING (3)
PROCESSOR SCHEDULING (3)
PROTOCOLS (3)
SCHEDULING (3)
TRANSACTIONAL MEMORY (3)
VIRTUAL MACHINE MONITORS (3)
VIRTUAL MACHINING (3)
ARRAYS (2)
BENCHMARK TESTING (2)
CACHE (2)
CACHE STORAGE (2)
COMPLEXITY THEORY (2)
COMPUTATIONAL MODELING (2)
CONCURRENT COMPUTING (2)
CONVOLUTION (2)
CONVOLUTIONAL NEURAL NETWORKS (2)
CUDA (2)
DATA REUSE (2)
DEVICE DRIVERS (2)
DISTRIBUTED OPERATING SYSTEM (2)
DIVERGENCE (2)
EMBEDDED COMPUTING (2)
EMBEDDED SYSTEM (2)
EQUATIONS (2)
HARDWARE VIRTUALIZATION (2)
HIGH PERFORMANCE COMPUTING (2)
HYPERVISOR (2)
LINUX KERNEL (2)
MICROPROCESSOR CHIPS (2)
OPERATING SYSTEM KERNEL (2)
PRESSES (2)
PROCESS CONTROL (2)
RADIATION DETECTORS (2)
RAY TRACING (2)
REAL-TIME OPERATING SYSTEM (2)
REAL-TIME SYSTEMS (2)
RECONFIGURABLE ARCHITECTURES (2)
SCHEDULES (2)
SERVER (2)
STORAGE MANAGEMENT (2)
SWITCHES (2)
SYSTEM-ON-A-CHIP (2)
SYSTEM-ON-CHIP (2)
TRANSACTION PROCESSING (2)
VIRTUAL MEMORY (2)
VIRTUAL STORAGE (2)
VIRTUALIZATION (2)
&#X0B5;C/OS-II (1)
4TH ORDER RUNGE-KUTTA SCHEME (1)
??_C LINUX (1)
ABSTRACTS (1)
ACCELERATED COMPUTATION (1)
ACCELERATION-AS-A-SERVICE (1)
ACCESS PRIORITIZATION (1)
ADAPTIVE COMPUTERS (1)
ADVECTION-DIFFUSION EQUATION (1)
AMBIGUITY FUNCTION (1)
AMD PROCESSOR (1)
API (1)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (1)
APPROXIMATE COMPUTING (1)
ARM (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Evaluating irregular memory access on OpenCL FPGA platforms: A case study with XSBench

AIScale — A coarse grained reconfigurable CNN hardware accelerator

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Automatic Control Flow Generation for OpenVX Graphs

Hardwiring the OS kernel into a Java application processor

Introducing approximate memory support in Linux Kernel

Memory fartitioning-based modulo scheduling for high-level synthesis

Detecting and Preventing Kernel Rootkit Attacks with Bus Snooping

Taming warp divergence

Tessellation-based multi-block memory mapping scheme for high-level synthesis with FPGA

Zero and data reuse-aware fast convolution for deep neural networks on GPU

Conducting reproducible research with Umbrella: Tracking, creating, and preserving execution environments

Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA

Soft2LM: Application Guided Heterogeneous Memory Management

COGITO: Code polymorphism to secure devices

Transitioning Native Application into Virtual Machine by Using Hardware Virtualization Extensions

Adaptive cyber-physical systems with interpreted operating system kernels

Research on survivability strategic of operating system

Effective Utilization of CUDA Hyper-Q for Improved Power and Performance Efficiency

Operating systems: Demand based modularity

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options