Search results

chapter

TSS: Applying two-stage sampling in micro-architecture simulations

Zhibin Yu, Hai Jin, Jian Chen, L.K. John

2009 IEEE International Symposium on Modeling, Analysis&Simulation of Computer and Telecommunication Systems > 1 - 9

2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)

Accelerating micro-architecture simulation is becoming increasingly urgent as the complexity of workload and simulated processor increases. This paper presents a novel two-stage sampling (TSS) scheme to accelerate the sampling-based simulation. It firstly selects some large samples from a dynamic instruction stream as candidates of detail simulation and then samples some small groups from each selected...

chapter

Using Integer Linear Programming in Test-bench Generation for Evaluating Communication Processors

E. Senn, D. Monnereau, A. Rossi, N. Julien

2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools > 217 - 220

2009 12th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools (DSD 2009)

This paper presents an innovative way to build flexible benchmarks based on micro-architecture independent characteristics. The proposed approach enables the testing and stressing of processors in order to reflect the real nature of applications and give meaningful information to the designers. The use of a limited number of basic blocks hand-coded in assembly, wisely chosen and arranged, enables...

chapter

Performance Comparison of Four-Socket Server Architecture on HPC Workload

H. Kasim, V. March, S. See

2009 International Conference on Computational Science and Engineering > 1 > 306 - 311

2009 International Conference on Computational Science and Engineering (CSE)

Recent server architectures embrace a common technology feature: on-chip parallelism via multi-core and CMT (Chip Multi Threading) technologies. However, they also significantly differ in a number of key aspects including clock speed, micro-architecture, cache hierarchy, and memory sub-system. Such differences may lead to difference levels of application performance. This paper presents a performance...

chapter

Design of AXI bus based MPSoC on FPGA

Fu-ming Xiao, Dong-sheng Li, Gao-ming Du, Yu-kun Song, more

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication > 560 - 564

2009 3rd International Conference on Anti-counterfeiting, Security, and Identification in Communication (2009 ASID)

While the computational core is becoming faster and faster, the communication efficiency between the processors has become a bottleneck which limits the performance of multiprocessor system-on-chip (MPSoC). This paper focuses on design and implementation of AXI bus protocol-based MPSoC architecture. Firstly, the RTL models of 4 NIOS II processors using AXI communication architecture are developed...

chapter

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor

Fenglong Song, Zhiyong Liu, Dongrui Fan, He Huang, more

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications > 571 - 576

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

On-chip many core architecture is an emerging and promising computation platform. High speed on-chip communication and abundant chipped resources are two outstanding advantages of this architecture, which provide an opportunity to implement efficient synchronization scheme. The practical execution efficiency of synchronization scheme is critical to this platform. However, there are few researches...

chapter

Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning

Guang Suo, Xue-jun Yang

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications > 190 - 195

2009 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Load balancing is an important problem for parallel applications. Recently, many super computers are built on multi-core processors which are usually sharing the last level cache. On one hand different accesses from different cores conflict each other, on the other hand different cores have different work loads resulting in load unbalancing. In this paper, we present a novel technique for balancing...

chapter

Performance study of Core2Duo desktop processors

A.R.A. Saif, K. Bin Jumari

2009 International Conference on Electrical Engineering and Informatics > 2 > 532 - 536

2009 International Conference on Electrical Engineering and Informatics (ICEEI)

The multicore processors turned out to open the doors to make the desktop to have parallel and high performance capability. In this paper, the performance study for these systems is presented, in which the studies were carried out on the Intel's Core2Duo processor with an OpenMP programming integrated into Microsoft visual studio C++ 2005 and Intel C++ 10.1.020 compiler. Using multithreaded programming,...

chapter

Evaluating Various Branch-Prediction Schemes for Biomedical-Implant Processors

C. Strydis, G.N. Gaydadjiev

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors > 169 - 176

2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

This paper evaluates various branch-prediction schemes under different cache configurations in terms of performance, power, energy and area on suitably selected biomedical workloads. The benchmark suite used consists of compression, encryption and data-integrity algorithms as well as real implant applications, all executed on realistic biomedical input datasets. Results are used to drive the (micro)architectural...

chapter

Implementation of a hardware branch-predictor evaluation platform based on FPGAs

E. Sedano, D. Chaver, J. Resano

2009 Ph.D. Research in Microelectronics and Electronics > 44 - 47

2009 Ph.D. Research in Microelectronics and Electronics (PRIME)

Branch prediction is an important topic in modern computer architecture research. Predictors attempt to improve the performance of a processor with a reasonable hardware cost. In the last decade, many prediction schemes have been developed in order to achieve this objective, each of them with different cost/performance tradeoffs. Identifying the optimal predictor for a given architecture and set of...

chapter

Evaluating Alpha-induced soft errors in embedded microprocessors

P. Rech, S. Gerardin, A. Paccagnella, P. Bernardi, more

2009 15th IEEE International On-Line Testing Symposium > 69 - 74

2009 15th IEEE International On-Line Testing Symposium (IOLTS 2009)

This paper presents the results of alpha single event upsets tests of an embedded 8051 microprocessor. Cross sections for the different memory resources (i.e., internal registers, code RAM, and user memory) are reported as well as the error rate for different codes implemented as test benchmarks. Test results are then discussed to find the contribution of each available resource to the overall device...

chapter

Orthogonal Instruction Encoding for a 16-bit Embedded Processor with Dynamic Implied Addressing Mode

J.M. Youn, Daeho Kim, Minwook Ahn, Yongjoo Kim, more

2009 11th IEEE International Conference on High Performance Computing and Communications > 545 - 550

2009 11th IEEE International Conference on High Performance Computing and Communications (HPCC)

Although 32-bit architectures are becoming the norm for modern microprocessors, 16-bit ones are still employed by many low-end processors, for which small size and low power consumption are of high priority. However, 16-bit architectures have a critical disadvantage for embedded processors that they do not provide enough encoding space to add special instructions coined for certain applications. To...

chapter

On pinning issues on multicore systems

C. Trinitis

2009 International Conference on High Performance Computing&Simulation > 81

2009 International Conference on High Performance Computing & Simulation (HPCS)

In recent years, a trend towards multi-core architectures with a growing number of cores for all standard instruction set architectures can be observed. To utilize the full potential of such novel microprocessor architectures, applications running on them must be efficiently parallelized and carefully analyzed regarding runtime, speedup, and parallel efficiency. With multi-core architectures becoming...

chapter

A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures

M. Jahre, M. Grannaes, L. Natvig

2009 11th IEEE International Conference on High Performance Computing and Communications > 622 - 629

2009 11th IEEE International Conference on High Performance Computing and Communications (HPCC)

The potential for destructive interference between running processes is increased as Chip Multiprocessors (CMPs) share more on-chip resources. We believe that understanding the nature of memory system interference is vital to achieve good fairness/complexity/performance trade-offs in CMPs. Our goal in this work is to quantify the latency penalties due to interference in all hardware-controlled, shared...

chapter

Efficient Heuristic Algorithm for Rapid Custom-Instruction Selection

Tao Li, Wu Jigang, Siew-Kei Lam, T. Srikanthan, more

2009 Eighth IEEE/ACIS International Conference on Computer and Information Science > 266 - 270

2009 8th IEEE/ACIS International Conference on Computer and Information Science (ICIS)

Custom-instruction selection is an essential phase in custom-instruction generation. It determines the most profitable custom instruction candidates for hardware implementation. In this paper, a practical computing model is proposed for the problem of custom-instruction selection that takes into account the hardware area constraint. Based on the new computing model, a novel heuristic algorithm is...

chapter

Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements

T.T. Hoang, M. Sjalander, P. Larsson-Edefors

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 7

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

As a simple five-stage General-Purpose Processor (GPP), the baseline FlexCore processor has a limited set of datapath units. By utilizing a flexible datapath interconnect and a wide control word, a FlexCore processor is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. In this paper, we propose the integration of a novel...

chapter

A Simulation Times Model of Multi-core Simulation

Zhibin Yu, Hai Jin, Yabin Hu

2009 WRI World Congress on Software Engineering > 1 > 7 - 11

2009 WRI World Congress on Software Engineering. WCSE 2009

Chip multi-processor (CMP) increases processor throughput by duplicating resources for many threads. Due to the main frequency of a single processor approaching to limit, CMP is becoming more and more popular. However, it is not well studied how to evaluate a new CMP design by simulation. This paper analyzes the possible organizations of cores on a CMP and then presents a mathematical model for the...

chapter

Exploring compiler optimizations for enhancing power gating

S. Roy, N. Ranganathan, S. Katkoori

2009 IEEE International Symposium on Circuits and Systems > 1004 - 1007

2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009

Power gating is a circuit level technique for reducing standby leakage in a circuit block by cutting off paths in it between the supply and the ground. A processor architecture that supports power gating of its resources may provide instructions that activate and deactivate those resources as part of the instruction set architecture level. Adequate compiler support is then required so that the power...

chapter

Energy-Efficient Encoding for High-Performance Buses with Staggered Repeaters

S. Jayaprakash, N.R. Mahapatra

2009 IEEE Computer Society Annual Symposium on VLSI > 252 - 257

2009 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2009)

High-performance buses often use staggered repeaters to mitigate the adverse impact on latency of worst-case capacitive crosstalk between adjacent wires by exploiting the data-dependent nature of crosstalk. An undesirable side effect of staggered repeaters is that they may increase the overall energy of a bus carrying highly correlated traffic associated with real-world benchmarks. In this paper,...

chapter

A Framework for Modeling Impact of Intrinsic Parameter Fluctuations at Architectural-Level

R.A. Ahmed, K. Samsudin, F.Z. Rokhani

2009 International Conference on Signal Processing Systems > 574 - 577

2009 International Conference on Signal Processing Systems (ICSPS)

As the semiconductor process technology continues to scale deeper into the nanometer region, the intrinsic parameter fluctuations will aggressively affect the performance of future microprocessors. Therefore one of the challenge of advanced CMOS manufacturing lies in modeling and simulating the intrinsic parameter fluctuations for accurately assessing the performance and the yield of the corresponding...

chapter

Core-aware memory access scheduling schemes

Zhibin Fang, Xian-He Sun, Yong Chen, S. Byna

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 12

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Multi-core processors have changed the conventional hardware structure and require a rethinking of system scheduling and resource management to utilize them efficiently. However, current multi-core systems are still using conventional single-core memory scheduling. In this study, we investigate and evaluate traditional memory access scheduling techniques, and propose a core-aware memory scheduling...

INFONA - science communication portal

Search results

TSS: Applying two-stage sampling in micro-architecture simulations

Using Integer Linear Programming in Test-bench Generation for Evaluating Communication Processors

Performance Comparison of Four-Socket Server Architecture on HPC Workload

Design of AXI bus based MPSoC on FPGA

Evaluation Method of Synchronization for Shared-Memory On-Chip Many-Core Processor

Balancing Parallel Applications on Multi-core Processors Based on Cache Partitioning

Performance study of Core2Duo desktop processors

Evaluating Various Branch-Prediction Schemes for Biomedical-Implant Processors

Implementation of a hardware branch-predictor evaluation platform based on FPGAs

Evaluating Alpha-induced soft errors in embedded microprocessors

Orthogonal Instruction Encoding for a 16-bit Embedded Processor with Dynamic Implied Addressing Mode

On pinning issues on multicore systems

A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures

Efficient Heuristic Algorithm for Rapid Custom-Instruction Selection

Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements

A Simulation Times Model of Multi-core Simulation

Exploring compiler optimizations for enhancing power gating

Energy-Efficient Encoding for High-Performance Buses with Staggered Repeaters

A Framework for Modeling Impact of Intrinsic Parameter Fluctuations at Architectural-Level

Core-aware memory access scheduling schemes

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options