The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Vehicle, as a significant object class in urban surveillance, attracts massive focuses in computer vision field, such as detection, tracking, and classification. Among them, vehicle re-identification (Re-Id) is an important yet frontier topic, which not only faces the challenges of enormous intra-class and subtle inter-class differences of vehicles in multicameras, but also suffers from the complicated...
View synthesis using depth image-based rendering generates virtual viewpoints of a 3D scene based on texture and depth information from a set of available cameras. One of the core components in view synthesis is image inpainting which performs the reconstruction of areas that were occluded in the available cameras but are visible from the virtual viewpoint. Inpainting methods based on Markov random...
A novel region based Active Contour Model (ACM) for image segmentation is presented using image local information for intensity inhomogeneity images. A transcendental (trigonometric) energy functional based on Local Fitted Image (LFI) energy is suggested to extract the image local information. The difference between the original and fitting image is introduced as an angular constraint of the trigonometric...
The paper presents the concept of a practical free-viewpoint television system with purely optical depth estimation. The system consists of camera modules that contain pairs or triples of cameras together with the respective microphones. The camera modules can be sparsely located in arbitrary positions around a scene. Each camera module is equivalent to a video camera with a depth sensor and microphones...
Dynamic Music Emotion Prediction is crucial to the emerging applications of music retrieval and recommendation. Considering the influence of temporal context and hierarchical structure on emotion in music, we propose a Deep Bidirectional Long Short-Term Memory (DBLSTM) based multi-scale regression method. In this method, a post-processing component is utilised for individual DBSLTM output to further...
In this paper, we propose a robust visual tracking method based on a temporal ensemble framework. Different from conventional ensemble-based trackers, which combine weak classifiers into a strong one using AdBoost in spatial fusion manners, our method adopts a powerful and efficient tracker integrated with its snapshots in different temporal windows of online tracking process to construct a temporal...
Three experiments addressing the assessment of perceived image quality in a patch-based manner are compared for HEVC compression artifacts. It is shown that image patches of a size small as 128×128 pixel are large enough to evaluate the perceived image quality in a Degradation Category Rating (DCR) setting. Ratings obtained with 128×128 pixel sized images patches and 512×512 pixel sized images of...
This paper proposes a novel approach to voice conversion with non-parallel training data. The idea is to bridge between speakers by means of Phonetic PosteriorGrams (PPGs) obtained from a speaker-independent automatic speech recognition (SI-ASR) system. It is assumed that these PPGs can represent articulation of speech sounds in a speaker-normalized space and correspond to spoken content speaker-independently...
In web topic detection, detecting “hot” topics from enormous User-Generated Content (UGC) on web data poses two main difficulties that conventional approaches can barely handle: 1) poor feature representations from noisy images and short texts; and 2) uncertain roles of modalities where visual content is either highly or weakly relevant to textual cues due to less-constrained data. In this paper,...
In the past, many research efforts are invested into discriminative action recognition task but the general temporal structure of human actions is overlooked. In this paper, we focus on a specific yet common structure of human actions: temporal symmetry. The key contribution is that we model the temporal symmetry property of human action and separate this signal out of original action sequences without...
In this paper, we propose a new data-driven transform, called sparse two-dimensional singular value decomposition (S2DSVD). By leveraging the advantages of discrete cosine transform and the conventional 2D SVD, we decompose a set of matrices into transform coefficient matrices with sparse and orthogonal basis functions. Such sparsity characteristic can significantly reduce their overhead, hence being...
Convolutional neural network has been recently studied and used in many object recognition tasks. In this work, we employ fully convolutional networks (FCNs) to recognize On-Premise Signs (OPS) in real scene. This technology is capable of being utilized in many camera-enabled devices like smart phones to develop practical commercial applications. The fully convolutional network technique is used to...
Alignment of human actions in videos is an important task for applications such as action comparison and classification. While well-established algorithms such as dynamic time warping are available for this task, they still heavily rely on basic linear cost models and heuristic parameter tuning. In this paper we propose a novel framework that combines the flexibility of the pair hidden Markov model...
Sometimes calligraphy lovers want to generate a calligraphic plaque in style of some famous calligraphers, but some characters hadn't been written or were damaged in the long history of Chinese calligraphy. It will be a significant thing to use computer-aided synthesis technology to create calligraphic characters in the particular style. Though such kinds of research work have been done, the synthesized...
Color transfer is an image processing technique commonly used to fix images with wrong colors, enhance the lighting conditions, or produce special styles to express specific emotions. With the aid of a reference image, the intended color characteristics can be properly transferred to the source images or videos. Since the applications start growing popular, many image color transfer methods have emerged;...
Since the number of surveillance cameras in public areas increases very fast, massive crowd videos are captured and shared, which brings an urgent need to retrieve these videos efficiently and effectively. However, most recent research on crowd video mainly focused on crowd behavior understanding and abnormal detection. In this study, as the very first attempt, we propose a crowd video retrieval method...
This paper proposes a graph-based Web video search reranking method through consistency analysis using spectral clustering. Graph-based reranking is effective for refining text-based video search results. Generally, this approach constructs a graph where the vertices are videos and the edges reflect their pairwise similarities. A lot of reranking methods are built based on a scheme which regularizes...
Estimating the 3D shape information of a face from a single image is a challenging task, especially when the input image is captured under unconstrained scenarios (e.g., variations of pose, illumination, expression, or even disguise). Previous approaches to this problem typically require careful initialization, registration, or segmentation of the face image regions. With the objective to match the...
Dynamic videos are viewed fundamentally different from static images. Besides spatial features, motion feature also plays an important role as a temporal factor. Most existing video saliency models usually employ optical flow to represent the motion feature. However, optical flow often suffers from the discontinuity problem. And we also notice that human fixations in one single video frame are much...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.