Search results for: Feiqi Su

Items from 1 to 6 out of 6 results

chapter

Overlapping dependent loads with addressless preload

Zhen Yang, Xudong Shi, Feiqi Su, Jih-Kwon Peir

2006 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 275 - 284

2006 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Modern out-of-order processors with non-blocking caches exploit Memory-Level Parallelism (MLP) by overlapping cache misses in a wide instruction window. The exploitation of MLP, however, can be limited due to long-latency operations in producing the base address of a cache miss load. When the parent instruction is also a cache miss load, a serialization of the two loads must be enforced to satisfy...

chapter

Directory Lookaside Table: Enabling scalable, low-conflict, many-core cache coherence directory

Xudong Shi, Feiqi Su, Jih-Kwon Peir

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) > 111 - 118

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Maintaining hardware cache coherence on future CMPs becomes increasingly important and difficult as the number of cores keeps accelerating in mainstream multicore chips. The simple snooping-bus coherence scheme is not suitable due to its limited scalability. The sparse coherence directory approach may incur extra cache invalidations due to a topological mismatch between the coherence directory and...

chapter

Weak execution ordering - exploiting iterative methods on many-core GPUs

Jianmin Chen, Zhuo Huang, Feiqi Su, Jih-Kwon Peir, more

2010 IEEE International Symposium on Performance Analysis of Systems&Software (ISPASS) > 154 - 163

2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2010)

On NVIDIA's many-core GPUs, there is no synchronization function among parallel thread blocks. When fine-granularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic...

article

Modeling and Stack Simulation of CMP Cache Capacity and Accessibility

Xudong Shi, Feiqi Su, Jih-Kwon Peir, Ye Xia, more

IEEE Transactions on Parallel and Distributed Systems > 2009 > 20 > 12 > 1752 - 1763

Performance trade-offs between fast data access by local data replication and cache capacity maximization by global data sharing have been extensively studied for many-core Chip Multiprocessors (CMPs). Costly simulations over a wide spectrum of the design space are generally required to gain insight for a sound design. To lower the cost, we develop an abstract model for understanding the performance...

chapter

Comparative evaluation of multi-core cache occupancy strategies

Feiqi Su, Xudong Shi, Gang Liu, Ye Xia, more

2007 International Conference on Parallel and Distributed Systems > 2 > 1 - 8

2007 International Conference on Parallel and Distributed Systems

Intelligent sharing cache space among multiple cores on a Chip Multiprocessor (CMP) has become an important research topic. There are many design options to trade off and many possible performance metrics to evaluate. It generally requires costly simulations to gain insights over a wide-spectrum of cache sharing and partitioning methods. In this paper, we use an efficient single-pass stack simulation...

chapter

Modeling and Single-Pass Simulation of CMP Cache Capacity and Accessibility

Xudong Shi, Feiqi Su, Jih-Kwon Peir, Ye Xia, more

2007 IEEE International Symposium on Performance Analysis of Systems&Software > 126 - 135

2007 IEEE International Symposium on Performance Analysis of Systems and Software

The future chip-multiprocessors (CMPs) with a large number of cores faces difficult issues in efficient utilizing on-chip storage space. Tradeoffs between data accessibility and effective on-chip capacity have been studied extensively. It requires costly simulations to understand a wide-spectrum of design spaces. In this paper, we first develop an abstract model for understanding the performance impact...

Filter options

Publication date

Set your own date range

Publication type

book (5)
article (1)

Keywords

CACHE STORAGE (3)
ANALYTICAL MODELS (2)
DATA REPLICATION (2)
DELAY (2)
MICROPROCESSOR CHIPS (2)
MULTIPROCESSING SYSTEMS (2)
PER-CORE PRIVATE STACK (2)
SHARED STACK (2)
SINGLE-PASS STACK SIMULATION (2)
ABSTRACT MODEL (1)
APPLICATION SOFTWARE (1)
AVERAGE MEMORY ACCESS TIME (1)
CACHE ACCESSIBILITY (1)
CACHE CAPACITY (1)
CACHE COHERENCE (1)
CACHE MEMORIES (1)
CACHE SHARING (1)
CHAOTIC COMMUNICATION (1)
CHIP MULTIPROCESSOR (1)
CHIP MULTIPROCESSORS (1)
CHIP-MULTIPROCESSOR (1)
CMP (1)
CMP CACHE (1)
COMPARATIVE EVALUATION (1)
COMPUTER GRAPHICS (1)
COMPUTER VISION (1)
COPROCESSORS (1)
COSTS (1)
DATA ACCESSIBILITY (1)
DATA COMMUNICATION (1)
DATA COMMUNICATION EQUIPMENT (1)
DATA PREFETCHING (1)
DIRECTORY-BASED PROTOCOL (1)
FREQUENCY (1)
GLOBAL DATA SHARING (1)
GLOBAL STACK (1)
GRAPHICS PROCESSING UNIT (1)
HOST SYNCHRONIZATION (1)
INFORMATION SCIENCE (1)
INSTRUCTION AND ISSUE WINDOW (1)
ITERATIVE METHODS (1)
LARGE-SCALE SYSTEMS (1)
MANY-CORE GPU (1)
MEASUREMENT (1)
MEMORY ARCHITECTURE (1)
MEMORY-LEVEL PARALLELISM (1)
MULTI-CORE CACHE OCCUPANCY STRATEGIES (1)
MULTI-THREADING (1)
MULTICORE PROCESSING (1)
MULTIPLE CACHE ORGANIZATION (1)
ON-CHIP CACHE CAPACITY (1)
ON-CHIP STORAGE SPACE (1)
PARALLEL THREAD BLOCKS (1)
PARTIAL DIFFERENTIAL EQUATIONS (1)
PARTITIONING METHODS (1)
PERFORMANCE EVALUATION (1)
PERFORMANCE MODELING (1)
POINTER-CHASING LOADS (1)
POISSON IMAGE EDITING (1)
PREDICTIVE MODELS (1)
REAL TIME SYSTEMS (1)
REUSE DISTANCES (1)
RUNTIME (1)
SHAPE FROM SHADING (1)
SHAPE MEASUREMENT (1)
SINGLE SIMULATION PASS (1)
SINGLE-PASS SIMULATION (1)
SNOOPING-BUS PROTOCOL (1)
SPARSE DIRECTORY (1)
STACK SIMULATION. (1)
TESLA C1060 (1)
TRAFFIC CONTROL (1)
VIRTUAL MACHINING (1)
WEAK EXECUTION ORDERING (1)
WIRING (1)
more

INFONA - science communication portal

Search results for: Feiqi Su

Overlapping dependent loads with addressless preload

Directory Lookaside Table: Enabling scalable, low-conflict, many-core cache coherence directory

Weak execution ordering - exploiting iterative methods on many-core GPUs

Modeling and Stack Simulation of CMP Cache Capacity and Accessibility

Comparative evaluation of multi-core cache occupancy strategies

Modeling and Single-Pass Simulation of CMP Cache Capacity and Accessibility

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options