The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today, machine learning based on neural networks has become mainstream, in many application domains. A small subset of machine learning algorithms, called Convolutional Neural Networks (CNN), are considered as state-ofthe- art for many applications (e.g. video/audio classification). The main challenge in implementing the CNNs, in embedded systems, is their large computation, memory, and bandwidth...
Multiview video coding (MVC) has recently received considerable attention. It is proposed as an extension of H.264/Advanced Video Coding (AVC) standard for multiple video source compression. To resolve the extremely high computational complexity of MVC (and in fact other AVC techniques), suitable parallel algorithms need to be developed that are amenable to implementation on low-cost massively parallel...
Chip multiprocessors (CMPs) and heterogeneous architectures have become predominant in all market segments, from embedded to high performance computing. These architectures exacerbate on-chip data requirements, creating additional pressure on the memory subsystem. Consequently, efficient utilization of on-chip memory space becomes critical for data intensive applications. A promising means of addressing...
Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this...
In this paper, we propose a multilevel parallel intra coding for H.264/AVC based on computed unified device architecture (CUDA). The proposed parallel algorithm improves the parallelism between 4x4 blocks within a macro block (MB) by throwing off some inappreciable prediction modes. By partitioning a frame into multi-slice, the parallelism between MBs can be exploited. In addition, a scalable parallel...
Intra prediction is the most important intensive computing component in H.264 intra frame coder. Its high computational costs give huge pressure to most current embedded programmable processors, especially in real-time HD H.264 video encoding. Stream processing model, an emerging parallel processing model supported by GPUs and most programmable processors, bridges the gap between flexible programmable...
Storm processor is a stream-based prototype processor designed for media processing. It has good performance and high efficiency for modern media processing and signal processing applications. It exploits the large amounts of parallelism available in many signal processing applications yet achieves high power efficiency by managing data movement directly with an on-chip register-file hierarchy and...
Real-time encoding of high-definition H.264 video is a challenge to current embedded programmable processors. Emerging stream processing methods supported by most GPUs and programmable processors provide a powerful mechanism to achieve surprising high performance in media/signal processing, which bring an opportunity to deal with this challenge. However, traditional serial CAVLC has highly input-dependent...
Due to the rapid growth of graphics processing unit (GPU) processing capability, using GPU as a coprocessor to assist the central processing unit (CPU) in computing massive data becomes essential. In this paper, we present an efficient block-level parallel algorithm for the variable block size motion estimation (ME) in H.264/AVC with fractional pixel refinement on a computer unified device architecture...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.