The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose a patched-based deep Boltzmann shape priors for visual tracking. The shape priors are generated from deep Boltzmann machine network. The network consists of three layers of hidden and visible units. The generated shapes not only maintain general shapes from a variety of poses, but also entail local modifications with high probability.
This target detection and tracking system is the basis for rescue robots to achieve their independent search and rescue operations. In order to improve their mobile performance and sensing capability, the Kinect camera is employed by rescue robots to obtain environmental visual. The AKAZE(Accelerated-KAZE) feature matching algorithm is adopted to achieve target detection in video frames, combining...
In this paper, a robust visual tracking system by utilizing the images acquired from a color camera and a thermal camera is proposed to track the target with real-time performance. The thermal camera, which can observe the heat originated from the target such as the human body or vehicle, can collaborate with the color camera to track the target in the cluttered environment or under occlusion. Unlike...
The rapid development of three-dimensional (3D) imaging techniques has significantly increased the demand for high resolution (HR) depth video and images. Significant pixel deficiencies and too much noise can be seen in depth images especially taken from Kinect cameras. For this reason, usability in several computer vision applications is restricted. In the acquisition of HR depth images, in traditional...
Saliency detection in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency detection method based on optimization for RGBD images. With RGBD images, our method utilizes the depth channel to enhance the identification of background and foreground regions. We firstly generate new depth image by using non-linear transformation...
Although the recent success of convolutional neural network (CNN) advances state-of-the-art saliency prediction in static images, few work has addressed the problem of predicting attention in videos. On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces. Therefore, we propose in this paper a novel...
One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world. Humans can learn about the characteristics of objects and the relationships that occur between them to learn a large variety of visual concepts, often with few examples. This paper investigates the...
Cross-modal retrieval has attracted intensive attention in recent years. Measuring the semantic similarity between heterogeneous data objects is an essential yet challenging problem in cross-modal retrieval. In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled...
We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the users active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly...
Many computer vision problems require optimization of binary non-submodular energies. In this context, iterative submodularization techniques based on trust region (LSA-TR) and auxiliary functions (LSA-AUX) have been recently proposed [9]. They achieve state-of-the-art-results on a number of computer vision applications. In this paper we extend the LSA-AUX framework in two directions. First, unlike...
Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering...
A method based on cosegmentation is applied to change detection to segment image patches belonging to each image. The image patches have the characteristics of spatial correspondence in multi-temporal images and precise boundary in its own image. By construction and optimization of energy function that consists of change feature item and image feature item, both of spectrum and shape change can successfully...
Several recent works have used deep convolutional networks to generate realistic imagery. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces...
In this paper, we present ResNet-based vehicle classification and localization methods using real traffic surveillance recordings. We utilize a MIOvision traffic dataset, which comprises 11 categories including a variety of vehicles, such as bicycle, bus, car, motorcycle, and so on. To improve the classification performance, we exploit a technique called joint fine-tuning (JF). In addition, we propose...
A ubiquitous problem in pattern recognition is that of matching an observed time-evolving pattern (or signal) to a gold standard in order to recognize or characterize the meaning of a dynamic phenomenon. Examples include matching sequences of images in two videos, matching audio signals in speech recognition, or matching framed trajectories in robot action recognition. This paper shows that all of...
Despite the rapid progress of the techniques for image classification, video annotation has remained a challenging task. Automated video annotation would be a breakthrough technology, enabling users to search within the videos. Recently, Google introduced the Cloud Video Intelligence API for video analysis. As per the website, the system can be used to "separate signal from noise, by retrieving...
The past decade has witnessed the popularity of video conferencing, such as FaceTime and Skype. In video conferencing, almost every frame has a human face. Hence, it is necessary to predict attention on face videos by saliency detection, as saliency can be used as a guidance of regionof- interest (ROI) for the content-based applications. To this end, this paper proposes a novel approach for saliency...
We propose a method for transferring an arbitrary style to only a specific object in an image. Style transfer is the process of combining the content of an image and the style of another image into a new image. Our results show that the proposed method can realize style transfer to specific object.
This paper discusses a possible implementation of the integration of knowledge from a probabilistic ontology in the automatic description of images. This combination not only provides the relations existing between the different segments, but also improve the classification accuracy, as the context often gives cues suggesting the correct class of the segment.
This paper proposes a new spatio-temporal appearance feature named Phasic Maximal and Local Maximal Occurrence (PM-LOMO) representation for video-based person re-identification. To perform temporal alignment of the sequence, we selected the optimal period of walking cycle and divide frames into several phases based on the extreme points of the sequence's Flow Energy Profile (FEP). To describe the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.