The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper investigates the consistency in phased array element performance by extracting information from the Full Matrix Capture (FMC) of a reflection from a planar interface. The purpose of this work is to generate a robust methodology for tracking phased array performance over time, therefore, ensuring the reliability of measured data. To achieve this, a calibration method has been developed that...
We demonstrate the development, performance and application of a GaN-based micro-light emitting diode array sharing a common p-electrode with individual-addressed n-electrodes. These individually-addressed n-electrodes minimize the series-resistance difference from conductive paths, and offer compatibility with n-type metal-oxide-semiconductor transistor-based drivers for faster modulation.
High-power, high-linearity CC-MUTC photodiodes, directly integrated into connected and tightly coupled array antennas enable ultra-wideband (UWB) phased array operation with improved size, weight, and power (SWaP). Presented is high-fidelity beam steering and bandwidth performance of several of these one-dimensional photodiode-integrated antenna arrays.
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the peak DSP clock rate with standard FPGA tools. In the HDL design environment, our processing array operates at the peak DSP clock rates on Xilinx UltraScale...
Some modern high-level synthesis (HLS) tools [1] permit the synthesis of multi-threaded software into parallel hardware, where concurrent software threads are realized as concurrently operating hardware units. A common performance bottleneck in any parallel implementation (whether it be hardware or software) is memory bandwidth — parallel threads demand concurrent access to memory resulting in contention...
Reading and writing data efficiently from storage system is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One well-known strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces...
5G, the next generation of wireless communications, is focusing on modern antenna technologies like massive MIMO, phased arrays and mm-wave band to obtain data rates up to 10 Gbps. In this paper, we have proposed a new 64 element, 8×8 phased series fed patch antenna array, for 28 GHz, mm-wave band 5G mobile base station antennas. The phased array steers its beam along the horizontal axis to provide...
The growing prominence and computational challenges imposed by Deep Neural Networks (DNNs) has fueled the design of specialized accelerator architectures and associated dataflows to improve their implementation efficiency. Each of these solutions serve as a datapoint on the throughput vs. energy trade-offs for a given DNN and a set of architectural constraints. In this paper, we set out to explore...
Graph algorithms such as breadth-first search (BFS) have been gaining ever-increasing importance in the era of Big Data. However, the memory bandwidth remains the key performance bottleneck for graph processing. To address this problem, we utilize processing-in-memory (PIM), combined with non-volatile metal-oxide resistive random access memory (ReRAM), to improve the performance of both computation...
ASKAP has recently started its Early Science program with 12 MkII PAF-equipped antennas and 36 beams simultaneously covering a 30 square degree field of view. The first observations have focused on mapping extragalactic neutral hydrogen in galaxy groups and clusters selected by the ‘WALLABY’ Survey Science Team. Significant efforts from engineers, software designers, and scientists are overcoming...
Parity declustering is widely deployed in erasure coded storage systems so as to provide fast recovery and high data availability. However, to perform scaling on such RAIDs, it is necessary to preserve the parity declustered data layout so as to guarantee the RAID performance after scaling. Unfortunately, existing scaling algorithms fail to achieve this goal so they can not be applied for scaling...
Ever-increasing amounts of data are created and processed in internet-scale companies such as Google, Facebook, and Amazon. The efficient storage of such copious amounts of data has thus become a fundamental and acute problem in modern computing. No single machine can possibly satisfy such immense storage demands. Therefore, distributed storage systems (DSS), which rely on tens of thousands of storage...
Binary maximum distance separable (MDS) array codes are a special class of erasure codes for distributed storage that not only provide fault tolerance with minimum storage redundancy, but also achieve low computational complexity. They are constructed by encoding k information columns into r parity columns, in which each element in a column is a bit, such that any k out of the k + r columns suffice...
This paper presents an explicit construction for an ((n = 2qt, k = 2q{t−1), d = n − (q + 1)), (α = q(2q)t−1,β = α/q)) regenerating code over a field Fq operating at the Minimum Storage Regeneration (MSR) point. The MSR code can be constructed to have rate k/n as close to 1 as desired, sub-packetization level α ≤ rn/r for r = (n − k), field size Q no larger than n and where all code symbols can be...
A phased array lens has limited bandwidth due to the phase shifters that collimate and scan the beam. A wideband signal requires time delay units in place of phase shifters. This paper investigates the feasibility of implementing time-delay units in array lens antennas. Time-delay compensation mechanisms for array lens antennas are outlined and investigations are carried to determine the required...
The increasing use of machine learning algorithms, such as Convolutional Neural Networks (CNNs), makes the hardware accelerator approach very compelling. However the question of how to best design an accelerator for a given CNN has not been answered yet, even on a very fundamental level. This paper addresses that challenge, by providing a novel framework that can universally and accurately evaluate...
In recent years, the demand for memory performance has grown rapidly due to the increasing number of cores on a single CPU, along with the integration of graphics processing units and other accelerators. Caching has been a very effective way to relieve bandwidth demand and to reduce average memory latency. As shown by the cache feature table in Fig. 23.9.1, there is a big latency gap between SRAM...
High-level synthesis (HLS) is getting increasing attention from both academia and industry for high-quality and high-productivity designs. However, when inferring primitive-type arrays in HLS designs into on-chip memory buffers, commercial HLS tools fail to effectively organize FPGAs' on-chip BRAM building blocks to realize high-bandwidth data communication; this often leads to suboptimal quality...
Matrix multiplication is one of the most widely used computational kernels in scientific computing and machine learning. Using dedicated circuit for matrix multiplication can reduce the computational time and energy consumption. Traditional matrix multipliers always adopt linear array architecture, which works inefficiently when the size of matrix sub-block is much smaller than the array length. Using...
For many intensive computing tasks, simultaneous data access into multi-dimensional data arrays is highly restricted by its data mapping strategy and memory port constraint. As such, to increase memory accessing bandwidth, innovative memory partitioning and mapping algorithms have been proposed to simultaneously access multiple memory blocks through physically distributing data elements in the same...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.