The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Identifying biases in articles published in the news media is one of the most fundamental problems in the realm of journalism and communication, and automatic mechanisms for detecting that a piece of news is biased have been studied for decades. In this paper, we compare the WiSARD classifier, a lightweight efficient weightless neural network architecture, against Logistic Regression, Gradient Tree...
The power consumed by memory system in GPUs is a significant fraction of the total chip power. As thread level parallelism increases, GPUs are likely to stress cache and memory bandwidth even more, thereby exacerbating power consumption. We observe that neighboring concurrent thread arrays (CTAs) within GPU applications share considerable amount of data. However, the default GPU scheduling policy...
Today, machine learning based on neural networks has become mainstream, in many application domains. A small subset of machine learning algorithms, called Convolutional Neural Networks (CNN), are considered as state-ofthe- art for many applications (e.g. video/audio classification). The main challenge in implementing the CNNs, in embedded systems, is their large computation, memory, and bandwidth...
Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts...
Next generation non-volatile memories, like Resistive RAM, Spin-Transfer Torque Magnetic RAM and Phase Change Memory, are byte- addressable with very low latency, bridging the large performance gap between DRAM memory and NAND flash storage. For this reason we think of them as Storage Class Memories (SCMs), meaning their main use could ideally be as main memory but the non-volatility and high density...
Autonomous vehicles are an exemplar for forward-looking safety-critical real-time systems where significant computing capacity must be provided within strict size, weight, and power (SWaP) limits. A promising way forward in meeting these needs is to leverage multicore platforms augmented with graphics processing units (GPUs) as accelerators. Such an approach is being strongly advocated by NVIDIA,...
It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load...
Many applications these days require data processing that is both efficient and reliable. Distributed databases are one way to meet these requirements, but must be updated using distributed transactions. To manage foreign key constraints, secondary indices, and materialized views in distributed environments, read atomic multi-partition (RAMP) transactions demonstrate high efficiency. RAMP transactions...
Conventional yield optimization approaches rely on accurate yield estimation for given design parameters, which would be computational intensive. In this paper, a novel Bayesian yield optimization approach is proposed for analog and SRAM circuits. An equivalent problem is formulated via applying Bayes' theorem on the augmented yield problem. The yield optimization problem is converted to identifying...
Linux kernel feature of Cgroups (Control Groups) is being increasingly adopted for running applications in multi-tenanted environments. Many projects (e.g., Docker) rely on cgroups to isolate resources such as CPU and memory. It is critical to ensure high performance for such deployments. At LinkedIn, we have been using Cgroups and investigated its performance. This work presents our findings about...
In this paper, we propose a memory accessing method of Parallel Failureless Aho-Corasick (PFAC) algorithm considering Graphic Processing Unit (GPU) memory architecture for throughput improvement. Compared with Aho-Corasick (AC) Algorithm using Central Processing Unit (CPU) and Data-Parallel Aho-Corasick (DPAC) using Open Multi-Processing (OpenMP), PFAC using GPU achieves high performance advancement...
Digital Forensics is a field of computer science that aids in determining what may or may not have occurred during some computer task. The bit-by-bit concept satisfies computer media, but it does not apply to smartphones. One experiment was designed using three devices, Android HTC Aria, Apple iPhone 3G, and Windows Mobile HTC TouchPro 6850. These experiments compare and contrast the device by carrier,...
Memory Forensics becomes indispensable in Cyber Forensics Investigation as Random Access Memory or Physical Memory of a Computer holds crucial evidence which is nowhere available on Hard Disks or in other non-volatile storage media. This is because, nowadays most of the malwares are memory resident which leaves no footprints in Hard Disk storage. In this paper, a novel methodology is described for...
21st century is best known as technology centaury and the advancement in technology has helped in close knit networking of people in the world. With the advent of internet & social networking sites, connecting with people is a click away. These advancements warranted deep research in social networking [1] and its related technologies. The social networking sites store humongous data and when a...
Pushing supply voltages in the near-threshold region is today one of the main avenues to minimize power consumption in digital integrated circuits. This works well with logic units, but memory operations on standard six-transistor static RAM (6T-SRAM) cells become unreliable at low voltages. Standard cell memory (SCM) works fully reliably at near-threshold voltages, but has much lower area density...
Big data platforms like Hadoop and Spark are being widely adopted both by academia and industry. In this paper, we propose a runtime intrusion detection technique that understands and works according to the memory properties of such distributed compute platforms. The proposed method is based on runtime analysis of memory access patterns of tasks running on the slave nodes of a distributed compute...
Embedded systems are proliferating with their growing hardware capabilities. Their application areas include internet of things, cellular devices, network devices, etc. Application development and testing natively on such embedded hardware is expensive, time consuming, and challenging. In this case, system emulation is a cost-effective alternative. We have extended Quick Emulator (QEMU) to support...
In ultrasound image analysis, speckle tracking methods are widely applied to study the elasticity of body tissue. However, “feature-motion decorrelation” still remains as a challenge for speckle tracking methods. Recently, a coupled filtering method was proposed to accurately estimate strain values when the tissue deformation is large. The major drawback of the new method is its high computational...
Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation and object detection and localization. Here we consider the parallelization of inference, i.e., the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as the number of output voxels computed per unit...
Source code is a frequent target for plagiarism in massive computing courses. Plagiarism detection requires a significant effort from the teaching staff, thus software tools have been used to detect similar source codes. This paper examines parallelization of source code similarity detection based on Greedy-String-Tiling and Karp-Rabin algorithms. CPU implementation is parallelized using Pthreads,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.