The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The next-generation enterprise Xeon® processor consists of 10 Westmere 32nm cores and a shared inclusive L3 cache (LLC) integrated on a monolith ic die, with link-based l/Os. This paper focuses on the innovations and circuit optimizations over the predecessor targeting idle power reduction, robust high-speed I/O links, and performance per watt improvements. The processor is implemented in 32nm CMOS...
Transactional Memories (TM) have attracted much interest as an alternative to lock-based synchronization in shared-memory multiprocessors. Considering the use of TM on an embedded, NoC-based MPSoC, this work evaluates a LogTM implementation. It is shown that the time an aborted transaction waits before restarting its execution (the backoff delay) can seriously affect the overall performance and energy...
Codesigning applications and communication libraries to leverage underlying network features is imperative for achieving optimal performance on modern computing clusters.
Development of an efficient processor architecture with appropriate clocking mechanisms and datapath organization is one of the most challenging design issues for 32-/64-bit RSFQ processors. The cell-level design of a 32-bit RSFQ dual-lane integer processor has been developed at Stony Brook University in an effort to identify and study techniques capable of tolerating significant delay variations...
Increasing complexity of multicore embedded systems makes careful construction of virtual prototyping system crucial to shorten design turnaround time due to the growing demand of simulation time. Parallel simulation aims to accelerate the simulation speed by running component simulators concurrently. But extra overhead of communication and synchronization between simulators may overshadow the benefits...
We demonstrate speedup of barrier synchronization for parallel computing via wavelength parallelism of the optical switch using a k-ary tree to collect updates without incurring contention, and optical broadcast to distribute the notifications.
Estimating the execution time of programs has always been a concern in computer science. With the emergence of multi-core processors, this concern has found new perspectives and new parameters affect the runtime performance of parallel applications. To estimate the execution time of parallel applications, we investigate the overheads caused by parallelizing an application by identifying the overheads...
The matrix computations such as matrix-vector and matrix multiplication are very challenging computational kernels arising in scientific computing. In this paper, we study and evaluate a number of different data decomposition schemes for matrix computations on multicore architectures using OpenMP programming model. Further, in this work we propose a simple and fast analytical model to predict the...
Partial Reconfiguration (PR) is an FPGA feature that allows the modification of certain parts of an FPGA while the rest of it continues to operate without disruption. This distinctive characteristic of FPGAs has many potential benefits but also challenges. The lack of good CAD tools and the deep hardware knowledge requirement result in a hard to use feature. In this paper, the new Partition-based...
Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. A multicore FPGA platform with cache-integrated network interfaces (NIs) is presented, appropriate for scalable multicores, that combine the best of two worlds -the flexibility of caches (using...
This paper presents a fully programmable frame synchronization architecture of OFDM systems implemented on a multi-core processor platform. By utilizing the guard interval in OFDM signals, the coarse symbol synchronization (CSS) and the fractional carrier frequency offset estimation (CFO) are considered simultaneously. The multi-core processor platform is a 2-Dimension mesh array of SIMD (Single Instruction...
Modeling the execution of a processor and its instructions is a challenging problem, in particular in the presence of long pipelines, parallelism, and out-of-order execution. A naive approach based on finite state automata inevitably leads to an explosion in the number of states and is thus only applicable to simple minimalistic processors. During their execution, instructions may only proceed forward...
This paper describes the design and application of an execution-driven parallel simulator for predicting performance of Large-Scale Parallel Computers. The simulator can be used in hardware validation and software development for large-scale parallel computers. It simulates processors of each node, network components and disk I/O components. To illustrate the capabilities of our simulator, we describe...
Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of...
We use the polyhedral process network (PPN) model of computation to program and map streaming media applications onto embedded Multi-Processor Systems on Chip (MPSoCs) platforms. In previous works, it has been shown how to apply different process network transformations in isolation. In this work, we present a holistic approach combining the process splitting and merging transformations and show that...
An approach to carrying out asynchronous distributed simulation of multiprocessor message passing architectures is presented. Aiming at achieving better performance on Conservative DEVS-based simulations, we introduce the GLM protocol which borrows the idea of safe processing intervals from the conservative time window algorithm and maintains global synchronization in a fashion similar to the distributed...
The algebraic path problem (APP) unifies a number of related combinatorial or numerical problems into one that can be resolved by a generic algorithmic schema. In this paper, we propose a linear SPMD model based on the Warshall-Floyd procedure coupled with a systematic shift-toroïdal. Our scheduling requires a number of processors that equals the size of the input matrix. With a fewer number of processors,...
Concurrency is an integral part of many robotics applications, due to the need for handling inherently parallel tasks such as motion control and sensor monitoring. Writing programs for this complex domain can be hard, in particular because of the difficulties of retaining a robust modular design. We propose to use SCOOP, an object-oriented programming model for concurrency which by construction is...
The Fast Multipole Method (FMM) and Multi- Level Fast Multipole Algorithm (MLFMA) have been used to solve electromagnetic scattering problems for many years. Parallel implementations of MLFMA is currently a hot topic because it is capable of solving scattering problems with tens of millions of unknowns, with complexity O(NlogN), where N is the number of unknowns. In this paper, we discuss a new perfectly...
Dynamic Traffic Assignment (DTA) system [Ben-Akiva et al., 1991] [Mahmassani, 2001] benefits travelers by providing accurate estimate of current traffic conditions, consistent anticipatory network information as well as reliable route guidance. Over the years, two types of model adjustment schemes have been studied - DTA off-line calibration [Balakrishna, 2006] [Toledo et al., 2003] [van der Zijpp,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.