The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Most pedestrian detection algorithms only provide the object region instead of the actual body segmentation in video. For reducing the large number of redundant information and extracting a clear contour and texture feature of an up-right person, a superpixel segmentation algorithm with region correlation saliency analysis is proposed from coarse to fine cutting without any prior information. This...
Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions...
Although the recent success of convolutional neural network (CNN) advances state-of-the-art saliency prediction in static images, few work has addressed the problem of predicting attention in videos. On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces. Therefore, we propose in this paper a novel...
One characteristic that sets humans apart from modern learning-based computer vision algorithms is the ability to acquire knowledge about the world and use that knowledge to reason about the visual world. Humans can learn about the characteristics of objects and the relationships that occur between them to learn a large variety of visual concepts, often with few examples. This paper investigates the...
Cross-modal retrieval has attracted intensive attention in recent years. Measuring the semantic similarity between heterogeneous data objects is an essential yet challenging problem in cross-modal retrieval. In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled...
We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the users active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly...
Many computer vision problems require optimization of binary non-submodular energies. In this context, iterative submodularization techniques based on trust region (LSA-TR) and auxiliary functions (LSA-AUX) have been recently proposed [9]. They achieve state-of-the-art-results on a number of computer vision applications. In this paper we extend the LSA-AUX framework in two directions. First, unlike...
Multi-view subspace clustering aims to partition a set of multi-source data into their underlying groups. To boost the performance of multi-view clustering, numerous subspace learning algorithms have been developed in recent years, but with rare exploitation of the representation complementarity between different views as well as the indicator consistency among the representations, let alone considering...
A method based on cosegmentation is applied to change detection to segment image patches belonging to each image. The image patches have the characteristics of spatial correspondence in multi-temporal images and precise boundary in its own image. By construction and optimization of energy function that consists of change feature item and image feature item, both of spectrum and shape change can successfully...
Single feature of pedestrian is difficult to accurately describe the target using traditional algorithms. A new reidentification algorithm combing global features and local features with different distance metric function is introduced. First, weighted color histogram feature for whole pedestrian is extracted and combined with Bhattacharyya distance to roughly recognize targets. Then pedestrians’...
Several recent works have used deep convolutional networks to generate realistic imagery. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces...
Crowd behaviour analysis is a challenging task in computer vision, mainly due to the high complexity of the interactions between groups and individuals. This task is particularly crucial given the magnitude of manual monitoring required for effective crowd management. Within this context, a key challenge is to conceive a highly generic, fine and context-independent characterisation of crowd behaviours...
In this work the notion of automated risk assessment for 3D scenes is addressed. Using deep learning techniques smart enabled homes and domestic robots can be equipped with the functionality to detect, draw attention to, or mitigate hazards in a given scene. We extend an existing risk estimation framework that incorporates physics and shape descriptors by introducing a novel CNN architecture allowing...
In this paper, we present ResNet-based vehicle classification and localization methods using real traffic surveillance recordings. We utilize a MIOvision traffic dataset, which comprises 11 categories including a variety of vehicles, such as bicycle, bus, car, motorcycle, and so on. To improve the classification performance, we exploit a technique called joint fine-tuning (JF). In addition, we propose...
Human bodies and movements exhibit inherent symmetry. However, an important class of everyday movements, such as walking, does not maintain symmetry at every time instance. The symmetry in these movements is a spatiotemporal glide-reflection symmetry. The ability to measure this type of symmetry will provide us opportunities for various computer-aided applications including health monitoring, rehabilitation,...
Often multiple instances of an object occur in the same scene, for example in a warehouse. Unsupervised multi-instance object discovery algorithms are able to detect and identify such objects. We use such an algorithm to provide object proposals to a convolutional neural network (CNN) based classifier. This results in fewer regions to evaluate, compared to traditional region proposal algorithms. Additionally,...
A ubiquitous problem in pattern recognition is that of matching an observed time-evolving pattern (or signal) to a gold standard in order to recognize or characterize the meaning of a dynamic phenomenon. Examples include matching sequences of images in two videos, matching audio signals in speech recognition, or matching framed trajectories in robot action recognition. This paper shows that all of...
Tensors offer a natural representation for many kinds of data frequently encountered in machine learning. Images, for example, are naturally represented as third order tensors, where the modes correspond to height, width, and channels. In particular, tensor decompositions are noted for their ability to discover multi-dimensional dependencies and produce compact low-rank approximations of data. In...
Despite the rapid progress of the techniques for image classification, video annotation has remained a challenging task. Automated video annotation would be a breakthrough technology, enabling users to search within the videos. Recently, Google introduced the Cloud Video Intelligence API for video analysis. As per the website, the system can be used to "separate signal from noise, by retrieving...
The use of surveillance cameras continues to increase, ranging from conventional applications such as law enforcement to newer scenarios with looser requirements such as gathering business intelligence. Humans still play an integral part in using and interpreting the footage from these systems, but are also a significant factor in causing unintentional privacy breaches. As computer vision methods...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.