The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recent boom in use of speech recognition technology has made the access to potentially large amounts of training data easier. This, however, also constitutes a challenge in processing such large, continuously growing amount of information. Here we present a stochastic modification of traditional iterative training approach which leads to the same or even better accuracy of acoustic models and...
The computational complexity of a problem arising in the context of sparse optimization is considered, namely, the projection onto the set of k-cosparse vectors w.r.t. some given matrix Ω. It is shown that this projection problem is (strongly) NP-hard, even in the special cases in which the matrix Ω contains only ternary or bipolar coefficients. Interestingly, this is in contrast to the projection...
This paper presents a lossless coding solution to reduce the large overhead of external memory communication during the motion estimation process in current video coders. Our solution is called Differential Reference Frame Coder (DRFC), and uses two techniques together to compress the reference frame: a differential coding based on a simplified intra-prediction process to reduce the spatial redundancy...
The coding efficiency of the new video coding standard, High Efficiency Video Coding (HEVC), is strongly associated with better use of spatio-temporal redundancies thanks to an increased number of competing coding modes. However, this competition involves a massive increase in signaling bitrate which becomes a possible limit for the next generation of encoder. This paper proposesa new coding scheme...
Most modern video compression codecs, like VP9, HEVC and H.264, encode square or rectangular blocks either by inter prediction or intra prediction. A joint inter-intra predictor that combines motion compensation and intra extrapolation by two novel weighting schemes is proposed to improve compression quality. Prior work on joint prediction employs inter-intra weights that only rely on the pixel locations...
We consider a spectrum leasing system in which secondary networks offer offload services to a primary network (PN) in exchange of temporary access to the PN's spectrum. When the SANs collude and coordinate their prices, forming a cartel, the PN experiences cartel overcharge, which in our scenario implies lower transmission rates for the serviced PUs. To protect the spectrum owner's interests and possibly...
Optimal rate allocation is among the most challenging tasks to perform in the context of predictive video coding, because of the dependencies between frames induced by motion compensation. In this paper, we derive an analytical rate-distortion model that explicitly takes into account the dependencies between frames. The proposed approach allows us to formulate the frame-level optimal rate allocation...
Estimation of the quantities of harmful substances emitted into the atmosphere is one of the main challenges in modern environmental sciences. In most of the cases, this estimation requires solving a linear inverse problem. A key difficulty in evaluating the performance of any algorithm to solve this linear inverse problem is that the ground truth is typically unknown. In this paper we show that the...
This work studies the use of deep neural networks (DNNs) to address automatic language identification (LID). Motivated by their recent success in acoustic modelling, we adapt DNNs to the problem of identifying the language of a given spoken utterance from short-term acoustic features. The proposed approach is compared to state-of-the-art i-vector based acoustic systems on two different datasets: Google...
We perform bivariate statistical analysis and modeling of the joint distributions of spatially adjacent sub-band responses for both luminance/chrominance and range data in natural scenes. In particular, we introduce a multivariate generalized Gaussian distribution and an exponentiated sine function to model the underlying statistics and correlations. The experimental results show that the bivariate...
The recent success of deep neural networks (DNNs) in speech recognition can be attributed largely to their ability to extract a specific form of high-level features from raw acoustic data for subsequent sequence classification or recognition tasks. Among the many possible forms of DNN features, what forms are more useful than others and how effective these DNN features are in connection with the different...
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art...
This paper studies the challenging problem of detecting a low radar cross-section target in heavy sea clutter by proposing a physics-based sea clutter generation model. The model includes a process that generates random dynamic sea clutter based on the governing physics of water gravity and capillary waves and a finite-difference time-domain electromagnetics simulations process based on Maxwells equations...
Current sensory array systems do not fully exploit tactile sensing strategies widely used by vibrissal sensing animals to explore their surroundings. We develop a new tactile fluid-flow imaging technique, which relates rat's whisker movements to tomographic imaging to extract fluid-flow characteristics with a robotic whisker array. At high Reynolds numbers, the drag force on a whisker segment is proportional...
We investigate sampling and detection of orthogonal frequency-division multiplexing (OFDM) signals with unknown carriers at sub-Nyquist rates. Efficient acquisition and processing of such broadcast signals is a challenge but constitutes a crucial part of enabling cognitive radios. In order to alleviate both the analog and digital burden when treating wideband signals, we adapt the modulated wideband...
Far-end crosstalk severely degrades upstream rates in mixtures of vectored and non-vectored Very high-speed Digital Subscriber Loops (VDSL). As replacement of non-vectored VDSL systems by vectored VDSL systems is expected to be gradual, a crucial problem is the upstream rate optimization of vectored lines while maintaining the rate targets of non-vectored lines. To address this problem, this paper...
We propose a novel approach to motion detection in scenes captured from a camera onboard an aerial vehicle. In particular, we are interested in detecting small objects such as cars or people that move slowly and independently in the scene. Slow motion detection in an aerial video is challenging because it is difficult to differentiate object motion from camera motion. We adopt an unsupervised learning...
We propose a factorized robust matrix completion (FRMC) algorithm with global motion compensation to solve the video background subtraction problem. The algorithm decomposes a sequence of video frames into the sum of a low rank background component and a sparse motion component. The algorithm alternates between the solution of each component following a Pareto curve trajectory for each subproblem...
Designing a robust algorithm for visual object tracking has been a challenging task since many years. There are trackers in the literature that are reasonably accurate for many tracking scenarios but most of them are computationally expensive. This narrows down their applicability as many tracking applications demand real time response. In this paper, we present a tracker based on random ferns. Tracking...
This paper presents an automatic and efficient system for extracting dynamic objects of interest from videos. We take advantage of a saliency map and an optimization-based segmentation algorithm to extract the foreground objects automatically in some key frames. Then, the segmentation results in those key frames are propagated to other frames via an error map-based propagation scheme. Finally, a Bayesian...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.