Search results for: Jingling Xue

Items from 1 to 5 out of 5 results

chapter

Exploiting mixed SIMD parallelism by reducing data reorganization overhead

Hao Zhou, Jingling Xue

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 59 - 69

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Existing loop vectorization techniques can exploit either intra-or inter-iteration SIMD parallelism alone in a code region if one part of the region vectorized for one type of parallelism has data dependences (called mixed-parallelism-inhibiting dependences) on the other part of the region vectorized for the other type of parallelism. In this paper, we consider a class of loops that exhibit both types...

chapter

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors

Feng Wang, Hao Jiang, Ke Zuo, Xing Su, more

2015 44th International Conference on Parallel Processing > 200 - 209

2015 44th International Conference on Parallel Processing (ICPP)

This paper presents the design and implementation of a highly efficient Double-precision General Matrix Multiplication (DGEMM) based on Open BLAS for 64-bit ARMv8 eight-core processors. We adopt a theory-guided approach by first developing a performance model for this architecture and then using it to guide our exploration. The key enabler for a highly efficient DGEMM is a highly-optimized inner kernel...

chapter

Lifetime holes aware register allocation for clustered VLIW processors

Xuemeng Zhang, Hui Wu, Haiyan Sun, Jingling Xue

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1 - 4

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper presents an on-the-fly register allocator which dynamically detects and utilises lifetime holes for clustered VLIW processors. A lifetime hole is an interval in which a variable does not contain a valid value. A register holding a lifetime hole can be allocated to another variable whose live range fits in the lifetime hole, leading to more efficient utilisation of registers. We propose...

chapter

Thread-Sensitive Modulo Scheduling for Multicore Processors

Lin Gao, Quan Hoang Nguyen, Lian Li, Jingling Xue, more

2008 37th International Conference on Parallel Processing > 132 - 140

2008 37th International Conference on Parallel Processing (ICPP)

This paper describes a generalisation of modulo scheduling to parallelize loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and effectiveness of modulo scheduling. Our generalisation is simple, drops easily into traditional modulo scheduling algorithms such as Swing in GCC 4.1.1 and produces good speedups...

chapter

A gather/scatter hardware support for efficient Fast Fourier Transform

A.K.-A. Ku, J.Y.-C. Kuo, Jingling Xue

2008 13th Asia-Pacific Computer Systems Architecture Conference > 1 - 8

2008 13th Asia-Pacific Computer Systems Architecture Conference (ACSAC)

The increase of operating frequency of microprocessors has begun to meet more obstacles. Performance of single-thread applications no longer benefits from running under a faster processor. As a result, the performance increase has to come from additional hardware support which makes use of the large number of transistors available. This paper presents a novel hardware support called distTree to speed...

INFONA - science communication portal

Search results for: Jingling Xue

Exploiting mixed SIMD parallelism by reducing data reorganization overhead

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors

Lifetime holes aware register allocation for clustered VLIW processors

Thread-Sensitive Modulo Scheduling for Multicore Processors

A gather/scatter hardware support for efficient Fast Fourier Transform

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Jingling Xue

Exploiting mixed SIMD parallelism by reducing data reorganization overhead

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors

Lifetime holes aware register allocation for clustered VLIW processors

Thread-Sensitive Modulo Scheduling for Multicore Processors

A gather/scatter hardware support for efficient Fast Fourier Transform

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options