The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Very-long-instruction-word (VLIW) architectures are widely adopted in high-performance and low-power digital signal processors (DSP) due to their simplicity from extensive software optimizations. However, their poor code density (usually > 2× code size for a given application) and corresponding instruction accesses can overwhelm the energy savings on DSP datapaths. This paper presents variable-length...
The growing importance of three-dimensional radiotherapy treatment has been associated with the active presence of advanced computational workflows that can simulate conventional x-ray films from computed tomography (CT) volumetric data to create digitally reconstructed radiographs (DRR). These simulated x-ray images are used to continuously verify the patient alignment in image-guided therapies with...
The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...
Recent advances in FPGA technology and the proliferation of High Level Synthesis (HLS) tools makes it possible to implement complex System on Chip (SoC) designs that realize complete applications in a single FPGA device. To be able to exploit the large performance vs. area search space of such modern FPGA-based SoCs, system architects must have the appropriate performance analysis tools to evaluate-preferably...
Various architectural-based techniques have been proposed to reduce power consumption in GPGPUs. However, these techniques mostly ignore temperature of GPGPUs. In this paper, we focus on the register file and propose a new technique to reduce its peak temperature. Register file in GPGPUs is very large, even larger than caches, to support thousands of simultaneously execution threads. This makes register...
In this paper, we are devoted to the labeled multi-object tracking problem for generic observation model (GOM) in the framework of Finite set statistics. Firstly, we derive a product-labeled multi-object (P-LMO) filter which is a closed form solution to labeled multi-object Bayesian filter under the standard multi-object transition kernel and generic multi-object likelihood, and thus can be used as...
Power is a limiting factor in the design of embedded processors. For this reason adding more instruction extensions is not a scalable option. To overcome this issue, we study the effects of replacing the NEON unit of an ARM SoC with an FPGA-like reconfigurable fabric. We measure the gap between the conventional hard-NEON and a soft-NEON implementation. We found that the soft-NEON has an overhead of...
General-Purpose Graphics Processing Units (GPGPUs) exploit several levels of caches to hide latency of memory and provide data for thousands of simultaneously executing threads. L1 data cache and L2 cache are critical to performance of GPGPUs as an L1 data cache should provide data for all threads within the corresponding Streaming Multiprocessor (SM) and the L2 cache should service memory requests...
Correlation filter-based trackers achieve very good performance in visual tracking, but the traditional correlation tracking methods failed in mining the color information of the image sequence. To solve this problem, we propose a novel and robust scale adaptive tracker combined with color attributes in correlation filter framework, which extracts not only gray but also color information as the feature...
Learning from crowds, which the labels of the instances are collected through crowdsourcing ways, has become an important research topic recently. Personal Classifier (PC) approach is a representative approach for learning from crowds due to its convex optimization formulation. PC approach makes assumptions about parameters' distribution, thus it is a parametric approach. However, these assumptions...
Compression is a promising technique to increase effective capacity of caches. Due to latency overhead of decompression, most of previous studies applied compression to lower level caches. General-Purpose Graphics Processing Units (GPGPUs) are throughput oriented computing platforms which execute hundreds to thousands of threads, simultaneously. The massive number of threads makes GPGPUs less sensitive...
Many background subtraction algorithms have been proposed in the last fifteen years and an important issue is to provide a way to evaluate and compare most popular models according to criteria. This paper present a comparison among the eleven models using BMC dataset and give a guideline to choose different algorithms in different scenes by computing the F-measure, Peak Signal-Noise Ratio, Structural...
In driving process, providing accurate collision warning and effective advice about acceleration or deceleration in advance is beneficial to traffic safety. Furthermore, it will reduce the probability of vehicle collision. Much studies based on model or infrastructure have been proposed to solve this problem. However, their prediction accuracy is limited and few work utilize the large amount of historical...
To monitor virtual machines and applications in IaaS cloud environments, cloud providers require installing agents in tenants' virtual machines. It is inconvenient for users and is prone to cyber attacks. In this paper, we propose oMon, a out-of-the-box application monitoring framework for cloud applications. oMon does not requires installing any agent in the guest OS of tenants' virtual machines,...
Recognition of handwritten numerals has gained much interest in recent years due to its various application potentials. The progress of handwritten Bangla numeral is well behind Roman, Chinese and Arabic scripts although it is a major language in Indian subcontinent and is the first language of Bangladesh. Handwritten numeral classification is a high-dimensional complex task and existing methods use...
Hadoop is an open-source software framework that commonly used for distributed processing and distributed storage in large datasets (big data). It perform processing and stored the data across all nodes in cluster. But, the deployment and operational costs of physical cluster are costly in some reasons. Physical cluster needs high energy and rigid in manageability. As the solution, virtualization...
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams...
Classification is one of the core tasks in machine learning data mining. One of several models of classification are classification rules, which use a set of if-then rules to describe a classification model. In this paper we present a set of FPGA-based compute kernels for accelerating classification rule induction. The kernels can be combined to perform specific procedures in rule induction process,...
SIMD divergence is one of the critical factors that decrease the hardware utilization in contemporary GPGPUs (General Purpose Graphic Processor Unit). Both the reconvergence scheme and control flow detection have to be well considered. In the emerging HSA (Heterogeneous System Architecture) platform, we develop an effective dynamic stack-based re-convergence scheme that can be implemented without...
The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.