The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel for many scientific and engineering applications. However, SpMV performance and efficiency are poor on commercial of-the-shelf (COTS) architectures, specially when the data size exceeds on-chip memory or last level cache (LLC). In this work we present an algorithm co-optimized hardware accelerator for large SpMV problems. We start...
The design of an electrical impedance spectroscopy acquisition and processing system using a 0.13 μm CMOS technology with a 1kHz to 10 GHz functional frequency range is presented. The system is based on a quadrature modulator in a lock-in architecture. The design of each one of the modules of the system is explained, and post-layout simulations are used to validate the main features of the design...
Current hybrid network-on-chip designs in manycore systems are agnostic to the application requirements and thus are provided for general cases. This results in high cost in the manycore systems design, wasted energy and performance. We observe that the cost of network-on-chip designs can be reduced by optimizing the application-specific traffic onto the system. This paper presents mincostflow-based...
We propose a flexible multi-frequency channel correlator upgrade for MUSER-I array. The upgrade correlator has a more flexible architecture to be extensible for signal receiving elements. It can process 1024 frequency channels in IF band and the correlation sensibility is improved by utilizing 4-bit quantization in pre-correlation as well. The enhanced multi-frequency channel processing capability...
In this paper we discuss the potential of the integrated GPU to accelerate sorting by performing a partial sort prior to a comparison based CPU sort. We experiment along with several CPU comparison based sorting algorithms and outline the performance gain for a random input data set. We then analyze different x86 SoC architectures, and show that by sorting chunks stored inside the onchip GPU memory,...
High Performance Computing(HPC) applications are highly optimized to maximize allocated resources for the job such as compute resources, memory and storage. Optimal performance for MPI applications requires the best possible affinity across all the allocated resources. Typically, setting process affinity to compute resources is well defined, i.e MPI processes on a compute node have processor affinity...
Three Ge-on-Si photodetector architectures with different contacting schemes are compared, with emphasis on their bandwidth. The study shows that bandwidth > 50 GHz and responsivity > 1 A/W at 1490 nm can be achieved using a commercial silicon photonics process.
HPCG and Graph500 can be regarded as the two most relevant benchmarks for high-performance computing systems. Existing supercomputer designs, however, tend to focus on floating-point peak performance, a metric less relevant for these two benchmarks, leaving resources underutilized, and resulting in little performance improvements, for these benchmarks, over time. In this work, we analyze the implementation...
With the popularity of smart devices in a variety of actions to drive more usage of wireless broadband networks. OTT refers to delivery of video, audio and other media over the Internet. The video-related applications and services are major challenges that impact the network performance in the future. It is important to achieve network and service traffic offloading to overcome high-speed, real-time,...
To satisfy growing computational demands of modern applications, significant enhancements have been introduced in the contemporary processor architectures with the aim to increase their attainable performance, such as increased number of cores, improved capability of memory subsystem and enhancements in the processor pipeline [1]. Therefore, the performance improvements are usually coupled with an...
The recent increase in the complexity of processor architectures imposes significant challenges when designing and optimizing the execution of real-world applications, even on general-purpose hardware. To help in this process, tools for fast and insightful visualization of architecture and application execution bottlenecks are particularly useful for computer architects and application engineers,...
We proposed an extended OpenFlow-based control mechanism for multiple connections in the OpenScale architecture to achieve Coflow-aware bandwidth scheduling. Experimental demonstration verifies its overall feasibility.
Distributed file systems enable the reliable storage of exabytes of information on thousands of servers distributed throughout a network. These systems achieve reliability and performance by storing multiple copies of data blocks in different locations across the network. The management of these copies of data is commonly handled by intermediate servers that track and coordinate the placement of data...
Ultra-wideband (UWB) signal processing is a technology that has tremendous potential to develop advances in communication and information technology. However, it also presents challenges to the signal processing community, and, in particular, to sampling theory. This article outlines a UWB signal processing system via a basis projection and a basis system designed specifically for UWB signals. The...
Software-Defined Networking (SDN) is an innovative approach to provisioning and delivering QoS (Quality of Service) services, yet it is still devoid of context-differentiating services. In this paper we propose a network application (Autonomic QoS Broker) and a controller module that implements the OpenVSwitch Database Management Protocol (OVSDB). These two components were implemented and validated...
Extending the notion of Software Defined Network (SDN) from packet switching in Layers 2 and 3 to circuit switching in transport layer for service providers is a promising scenario to meet the high burstiness and high bandwidth requirements. For service providers to have a multilayer, multi-domain controller, which can provide automated controller based restoration and protection even in unprotected...
The current design drivers for multi-cores, namely performance per watt, scalability and flexibility, make the Networks-on-Chip (NoCs) the de-facto on-chip interconnect. State of the art NoCs can exploit heterogeneous solutions and complex DVFS techniques to fulfill also the variability of the application requirements. Relevant showstoppers to the design of a truly flexible NoC fitting all the possible...
Different applications require different communication performance between subnets in a global hybrid network-on-chip (NOC) of a heterogeneous CPU-GPU architecture (HSA). It is impractical to deploy (at design time) or switch-on (at runtime) all the hybrid routers in the network for a certain application that needs several hybrid routers for communication. Reconfiguring the customized global hybrid...
Sea ice model is a typical high performance computing problem. CPU and GPU based parallel method has been proposed to accelerate the simulation process, but it is still hard to meet the large-scale calculation demand due to the compute-intensive nature of the model. Sunway TaihuLight supercomputer use the SW26010 processor as its computing unit and achieves high performance for large-scale scientific...
Internet of Things (IoT) traffic will become increasingly heterogeneous not only in terms of traditional metrics as required bandwidth and maximum latency, but also in terms of functional requirements such as compute power and temporary storage. Sophisticated planning and engineering approaches must be adopted by service providers to account for this heterogeneity, inherent in IoT applications. Metropolitan...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.