The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The validation of transaction level models described in System-level Description Languages (SLDLs) often relies on extensive simulation. However, traditional Discrete Event (DE) simulation of SLDLs is cooperative and cannot utilize the available parallelism in modern multi-core CPU hosts. In this work, we study the SLDL execution semantics of concurrent threads and present a multi-core parallel simulation...
Virtual platform simulation is an essential technique for early-stage system-level design space exploration and embedded software development. In order to explore the hardware behavior and verify the embedded software, simulation speed and accuracy are the two most critical factors. However, given the increasing complexity of the Multi-Processor System-on-Chip (MPSoC) designs, even the state-of-the-art...
The matrix computations such as matrix-vector and matrix multiplication are very challenging computational kernels arising in scientific computing. In this paper, we study and evaluate a number of different data decomposition schemes for matrix computations on multicore architectures using OpenMP programming model. Further, in this work we propose a simple and fast analytical model to predict the...
Current trends signal an imminent crisis in the simulation of future CMPs (Chip Multiprocessors). Future micro-architectures will offer more and more thread contexts to execute parallel programs, but the execution speed of each thread will not improve at the same pace. CMPs with 10's or even 100's of cores are envisioned. Simulating these future CMP sefficiently without compromising accuracy is a...
An approach to carrying out asynchronous distributed simulation of multiprocessor message passing architectures is presented. Aiming at achieving better performance on Conservative DEVS-based simulations, we introduce the GLM protocol which borrows the idea of safe processing intervals from the conservative time window algorithm and maintains global synchronization in a fashion similar to the distributed...
This work evaluates the I/O performance in a multicore cluster environment for an atmosphere model for weather and climate simulations. It contains large data sets for I/O in scientific applications. The analysis demonstrates that the scalability of the system gets worse as we increase the number of cores per machine, with greater impact on output operations. We also demonstrate poor capacity of the...
An important topic in the field of Multi Robot Systems focuses on motion coordination and synchronization for formation keeping. Although several works have addressed such problem, little attention has been devoted to study the computational complexity within the framework of large-scale systems. This paper presents our current work on how to achieve high computational performance for systems composed...
Chip Multicore Processor (CMP) has become the mainstream microprocessor architecture in nowadays industry and academic literature. With the progress of CMP hardware developing and researching, software issues become more and more prominent. Coupled with these developments, many institutes and universities change their curriculums of computer architecture related courses. But the problem is do we really...
System noise or Jitter is the activity of hardware, firmware, operating system, runtime system, and management software events. It is shown to disproportionately impact application performance in current generation large-scale clustered systems running general-purpose operating systems (GPOS). Jitter mitigation techniques such as co-scheduling jitter events across operating systems improve application...
Time-consuming cycle-accurate MPSoC simulation is often needed for debugging and verification. Its practicability is put at risk by the growing MPSoC complexity. This work presents a conservative synchronous parallel simulation approach along with a SystemC framework to accelerate tightly-coupled MPSoC simulations on multi-core hosts. Key contribution is the implementation strategy, which utilizes...
This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure. Graphite is designed from the ground up for exploration of future multi-core processors containing dozens, hundreds, or even thousands of cores. It provides high performance for fast design space exploration and software development. Several techniques are used to achieve this including: direct...
The design of embedded computing systems faces a serious productivity gap due to the increasing complexity of their hardware and software components. One solution to address this problem is the modeling at higher levels of abstraction. However, manually writing proper executable system models is challenging, error-prone, and very time-consuming. We aim to automate critical coding tasks in the creation...
This paper presents a task-centric memory model for 1000-core compute accelerators. Visual computing applications are emerging as an important class of workloads that can exploit 1000-core processors. In these workloads, we observe data sharing and communication patterns that can be leveraged in the design of memory systems for future 1000-core processors. Based on these insights, we propose a memory...
We present a design for a hardware supported global synchronization unit that would be implemented on-chip and directly accessible by all processors in a multi-core architecture. This global synchronization unit will provide all processors with access to global state information from all other processors in just a few clock ticks, and can be used to perform highly efficient and scalable time synchronization...
As high-end computing systems continue to grow in scale, the performance that applications can achieve on such large scale systems depends heavily on their ability to avoid explicitly synchronized communication with other processes in the system. Accordingly, several modern and legacy parallel programming models (such as MPI, UPC, global arrays) have provided many programming constructs that enable...
The development of computer processor has stepped into the era of multi-core, providing a good chance to spread the parallel discrete event simulation. The parallel programming model and synchronization problem during the parallelization of discrete event simulation on multi-core platform were discussed. A parallel discrete event simulator based on multi-core platform was designed and implemented...
As multiprocessors become mainstream, techniques to address efficient simulation of multi-threaded workloads are needed. Multi-threaded simulation presents a new challenge: non-determinism across simulations for different architecture configurations. If the execution paths between two simulation runs of the same benchmark with the same input are too different, the simulation results cannot be used...
FPGA based multiprocessor SoC (MPSoC) is an on-chip multiprocessor with fully programmable feature which can reduce development cost and achieve performance requirement. In order to provide an MPSoC with the low-overhead communication and synchronization methods, this paper attempts to introduce the TSVM (tagged shared variable memory) cache to a snooping cache on the MPSoC. The TSVM cache can improve...
MPSS simulates the behavior of a high traffic transaction processing system. An effective use of MPSS is the analysis of the impact of exclusive control of system resources over multiple processes. MPSS consists of a control process and multiple application processes. The control process is designed to simulate a transaction processing monitor. It initiates and oversees multiple application processes...
Fine-grained dynamic voltage/frequency scaling (DVFS) demonstrates great promise for improving the energy-efficiency of chip-multiprocessors (CMPs), which have emerged as a popular way for designers to exploit growing transistor budgets. We examine the tradeoffs involved in the choice of both DVFS control scheme and method by which the processor is partitioned into voltage/frequency islands (VFIs)...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.