The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network...
Detecting community structure in epidemics networks is crucial for the assessment of epidemic dynamics and effective control of disease spread by targeting at the individuals bridging communities. Common community detection models (e.g., cut-criteria and modularity-criteria based model) are efficient in optimal quality of network partitions. However, most of the approaches fail to consider the dynamic...
Convolutional neural network (CNN) extracts features from big data by using the multilayer network structure. Due to the high effectiveness, CNN has achieved great successes in many fields such as computer vision and speech analysis. However, CNN training is quite challenging because computing the gradients through multiple layers is time consuming. In this paper, we propose to accelerate the computation...
With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM)...
Dataflow models of computation have been shown to provide an excellent basis for describing signal processing applications and mapping them to heterogeneous computing platforms that consist of multicore CPUs and graphics processing units (GPUs). Recently several efficient dataflow-based programming frameworks have been introduced for such needs. Most of contemporary signal processing applications...
In this paper, we introduce a cognitive model inspired in a rough description of the human cognition process to provide a more efficient parallel architecture for autonomous reactions in real systems. The model is a non-hierarchical structure composed of three main parallel blocks-mind, reason, action-in a nested double closed-loop configuration for control and supervision; the mind is a practical...
The rapidly growing design complexity has become a big obstacle and dramatically increased the time required for SystemC simulation. In this case study, we exploit different levels of parallelism, including thread- and data-level parallelism, to accelerate the simulation of a Bitcoin miner model in SystemC. Our experiments are performed on two multi-core processors and one many-core Intel(g) Xeon...
Smoothed particle hydrodynamics (SPH) is meshless-based numerical method to simulate free-surface flow problems. In this paper, water wave impact on a floating object is studied by implementing SPH method. An open-source DualSPHysics code which is developed based on SPH theory is used to simulate three-dimensional (3D) free-surface flow with floating object. Graphical processing units (GPUs) parallel...
This paper explores hardware acceleration to significantly improve the runtime of computing the forward algorithm on Pair-HMM models, a crucial step in analyzing mutations in sequenced genomes. We describe 1) the design and evaluation of a novel accelerator architecture that can efficiently process real sequence data without performing wasteful work; and 2) aggressive memoization techniques that can...
Modern task parallel programming models provide sophisticated runtime task schedulers for handling the scheduling of logical tasks on a large and varying number of hardware parallel resources at runtime. The performance of these programming models increasingly rely on how fast their runtime schedulers do their job. The more delay a scheduler incurs in matching a ready task to a free processor core...
Aggregators are market participants that bridge the gap between the bulk electricity market and the emerging active end-user (smart home) by efficiently scheduling or allocating resources to meet certain objectives in the electricity grid. The computational burden and processing time of such allocation problems increases with the number of resources. Using high performance computing and parallel processing...
Finite State Automata (FSA) are powerful computational models for extracting patterns from large streams (TBs/PBs) of unstructured data such as system logs, social media posts, emails, and news articles. FSA are also widely used in network security [6], bioinformatics [4] to enable efficient pattern matching. Compute-centric architectures like CPUs and GPG-PUs perform poorly on automata processing...
This paper presents a computational acceleration of image inpainting using parallel processing based on Graphics Processing Unit (GPU) Compute Unified Device Architecture (CUDA). We use parabolic partial differential equation (PDE) called heat equation as the model equation. The heat equation is discretized numerically using Finite Difference method. Semi-algebraic equation that formed then solved...
This article describes the methods of fuzzy operations implementation based on the model of 3D associative information storage and processing device. The offered methods differ by binary matrices comparison application basing on masked associative comparison with shift by rows.
The need for systems capable of conducting inferential analysis and predictive analytics is ubiquitous in a global information society. With the recent advances in the areas of predictive machine learning models and massive parallel computing a new set of resources is now potentially available for the computer science community in order to research and develop new truly intelligent and innovative...
The embedded software systems are first designed and validated by high level models such as MATLAB/Simulink functional models. However, implementing a Simulink functional model on multicore architecture is not trivial. Designers might need first to select an adequate multicore architecture that provides a higher performance for a given Simulink model. Hence, it is important to have a set of performance...
Expression of gene block, with the GPU parallel thread structure characteristic calculation, according to the structural characteristics of GPU thread design of double parallel mode, and the use of texture cache memory to achieve high efficiency; on the basis of CPU two level cache capacity of basic blocks further subdivided into sub blocks to improve the cache hit rate, the technology to reduce the...
In this paper, we presented a graph-based computing model to perform the basic arithmetic operations using the DNA computing model with maximum parallelization capabilities. In other words, we proposed a mathematical transformation model to map the basic arithmetic operations to the Hamilton Path Problem (HPP) which can be solved with DNA computers easily and efficiently. Our analyses and simulations...
GPUs have a natural affinity for streaming applications exhibiting consistent, predictable dataflow. However, many high-impact irregular streaming applications, including sequence pattern matching, decision-tree and decision-cascade evaluation, and large-scale graph processing, exhibit unpredictable dataflow due to data-dependent filtering or expansion of the data stream. Existing GPU frameworks do...
We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.