The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Accelerating micro-architecture simulation is becoming increasingly urgent as the complexity of workload and simulated processor increases. This paper presents a novel two-stage sampling (TSS) scheme to accelerate the sampling-based simulation. It firstly selects some large samples from a dynamic instruction stream as candidates of detail simulation and then samples some small groups from each selected...
This paper presents an innovative way to build flexible benchmarks based on micro-architecture independent characteristics. The proposed approach enables the testing and stressing of processors in order to reflect the real nature of applications and give meaningful information to the designers. The use of a limited number of basic blocks hand-coded in assembly, wisely chosen and arranged, enables...
Recent server architectures embrace a common technology feature: on-chip parallelism via multi-core and CMT (Chip Multi Threading) technologies. However, they also significantly differ in a number of key aspects including clock speed, micro-architecture, cache hierarchy, and memory sub-system. Such differences may lead to difference levels of application performance. This paper presents a performance...
While the computational core is becoming faster and faster, the communication efficiency between the processors has become a bottleneck which limits the performance of multiprocessor system-on-chip (MPSoC). This paper focuses on design and implementation of AXI bus protocol-based MPSoC architecture. Firstly, the RTL models of 4 NIOS II processors using AXI communication architecture are developed...
On-chip many core architecture is an emerging and promising computation platform. High speed on-chip communication and abundant chipped resources are two outstanding advantages of this architecture, which provide an opportunity to implement efficient synchronization scheme. The practical execution efficiency of synchronization scheme is critical to this platform. However, there are few researches...
Load balancing is an important problem for parallel applications. Recently, many super computers are built on multi-core processors which are usually sharing the last level cache. On one hand different accesses from different cores conflict each other, on the other hand different cores have different work loads resulting in load unbalancing. In this paper, we present a novel technique for balancing...
The multicore processors turned out to open the doors to make the desktop to have parallel and high performance capability. In this paper, the performance study for these systems is presented, in which the studies were carried out on the Intel's Core2Duo processor with an OpenMP programming integrated into Microsoft visual studio C++ 2005 and Intel C++ 10.1.020 compiler. Using multithreaded programming,...
This paper evaluates various branch-prediction schemes under different cache configurations in terms of performance, power, energy and area on suitably selected biomedical workloads. The benchmark suite used consists of compression, encryption and data-integrity algorithms as well as real implant applications, all executed on realistic biomedical input datasets. Results are used to drive the (micro)architectural...
Branch prediction is an important topic in modern computer architecture research. Predictors attempt to improve the performance of a processor with a reasonable hardware cost. In the last decade, many prediction schemes have been developed in order to achieve this objective, each of them with different cost/performance tradeoffs. Identifying the optimal predictor for a given architecture and set of...
This paper presents the results of alpha single event upsets tests of an embedded 8051 microprocessor. Cross sections for the different memory resources (i.e., internal registers, code RAM, and user memory) are reported as well as the error rate for different codes implemented as test benchmarks. Test results are then discussed to find the contribution of each available resource to the overall device...
Although 32-bit architectures are becoming the norm for modern microprocessors, 16-bit ones are still employed by many low-end processors, for which small size and low power consumption are of high priority. However, 16-bit architectures have a critical disadvantage for embedded processors that they do not provide enough encoding space to add special instructions coined for certain applications. To...
In recent years, a trend towards multi-core architectures with a growing number of cores for all standard instruction set architectures can be observed. To utilize the full potential of such novel microprocessor architectures, applications running on them must be efficiently parallelized and carefully analyzed regarding runtime, speedup, and parallel efficiency. With multi-core architectures becoming...
The potential for destructive interference between running processes is increased as Chip Multiprocessors (CMPs) share more on-chip resources. We believe that understanding the nature of memory system interference is vital to achieve good fairness/complexity/performance trade-offs in CMPs. Our goal in this work is to quantify the latency penalties due to interference in all hardware-controlled, shared...
Custom-instruction selection is an essential phase in custom-instruction generation. It determines the most profitable custom instruction candidates for hardware implementation. In this paper, a practical computing model is proposed for the problem of custom-instruction selection that takes into account the hardware area constraint. Based on the new computing model, a novel heuristic algorithm is...
As a simple five-stage General-Purpose Processor (GPP), the baseline FlexCore processor has a limited set of datapath units. By utilizing a flexible datapath interconnect and a wide control word, a FlexCore processor is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. In this paper, we propose the integration of a novel...
Chip multi-processor (CMP) increases processor throughput by duplicating resources for many threads. Due to the main frequency of a single processor approaching to limit, CMP is becoming more and more popular. However, it is not well studied how to evaluate a new CMP design by simulation. This paper analyzes the possible organizations of cores on a CMP and then presents a mathematical model for the...
Power gating is a circuit level technique for reducing standby leakage in a circuit block by cutting off paths in it between the supply and the ground. A processor architecture that supports power gating of its resources may provide instructions that activate and deactivate those resources as part of the instruction set architecture level. Adequate compiler support is then required so that the power...
High-performance buses often use staggered repeaters to mitigate the adverse impact on latency of worst-case capacitive crosstalk between adjacent wires by exploiting the data-dependent nature of crosstalk. An undesirable side effect of staggered repeaters is that they may increase the overall energy of a bus carrying highly correlated traffic associated with real-world benchmarks. In this paper,...
As the semiconductor process technology continues to scale deeper into the nanometer region, the intrinsic parameter fluctuations will aggressively affect the performance of future microprocessors. Therefore one of the challenge of advanced CMOS manufacturing lies in modeling and simulating the intrinsic parameter fluctuations for accurately assessing the performance and the yield of the corresponding...
Multi-core processors have changed the conventional hardware structure and require a rethinking of system scheduling and resource management to utilize them efficiently. However, current multi-core systems are still using conventional single-core memory scheduling. In this study, we investigate and evaluate traditional memory access scheduling techniques, and propose a core-aware memory scheduling...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.