The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present a method to reconstruct the three-dimensional shape of a moving instance of a known object category in video data. We exploit state-of-the-art semantic segmentation techniques to extract the object's two-dimensional shape in each frame. Therefore, our method is robust to occlusion, handles stationary objects and extends naturally to multiple video sequences. We apply Structure from Motion...
Modern young people (“digital natives”) have grown in an era dominated by new technologies where communications are pushed to quite a real-time level, and pose no limits in establishing relationships with other people or communities. However, the speed of evolution does not allow young people to split consciously acceptable behaviors from potentially harmful ones and a new phenomenon known as cyber...
Reliable object discovery in realistic indoor scenes is a necessity for many computer vision and service robot applications. In these scenes, semantic segmentation methods have made huge advances in recent years. Such methods can provide useful prior information for object discovery by removing false positives and by delineating object boundaries. We propose a novel method that combines bottom-up...
With the success of deep learning in the last few years, the object detection community shifted from processing on exhaustive sliding windows to smaller set of object proposals using more powerful and deep visual representations. Object proposals increase the accuracy and speed up detection process by reducing the search space. In this paper we propose a novel idea of filtering irrelevant edges using...
We propose to learn semantic spatio-temporal embeddings for videos to support high-level video analysis. The first step of the proposed embedding employs a deep architecture consisting of two channels of convolutional neural networks (capturing appearance and local motion) followed by their corresponding Gated Recurrent Unit encoders for capturing longer-term temporal structure of the CNN features...
We propose mutually incoherent pose bases for action recognition in static image, each of which implicitly represents co-occurrence of poselets. First of all, action specific poselets are trained. To suppress the ambiguity of detection, we cluster poselet activations by the overlap of predicted torso bound of each poselet. Then pose feature of an action person can be extracted which is a vector composed...
Content based indexing is critical to the effective access of the multimedia data. To this end, visual data is often annotated with textual content for bridging the semantic gap. In this paper, we present a method to generate frame level fine grained annotations for a given video clip. Access to the frame level fine grained annotations lead to rich, dense and meaningful semantic associations between...
Wireless capsule endoscopy video summarization (WCE-VS) is highly demanded for eliminating redundant frames with high similarity. Conventional WCE-VS methods extract various hand-crafted features as image representations. Researches show that such features only reflect the low-level characteristics of single frame and essentially are not effective to capture the semantic similarity between WCE frames...
Realistic scene object recognition in computer vision still faces great challenges due to the large intra-class variation of object images caused by factors like object appearance variation and viewpoint change. To address this challenge, we propose to exploit the semantic relationships embedded in object taxonomy for improved object recognition. Specifically, we exploit the relationships in the object...
Attributes are defined as mid-level image characteristics shared among different categories. These characteristics are suitable in order to handle classification problems especially when training data are scarce. In this paper, we design discriminative real-valued attributes by learning nonlinear inductive maps. Our method is based on solving a constrained optimization problem that mixes three criteria;...
Multi-label classification has attracted many attentions in various fields, such as text categorization and semantic image annotation. Aiming to classify an instance into multiple labels, various multi-label classification methods have been proposed. However, the existing methods typically build models in the identical feature (sub)space for all labels, possibly inconsistent with real-world problems...
Correlation tracker has made a huge success in visual object tracking. However, it is mainly because that the tracker cannot catch the occurrence of appearance changes, tracking based on correlation filters often drifts due to the unexpected appearance changes caused by occlusion, deformation and background clutter. In this paper, we propose a new method to detect the case when the tracker encountered...
We consider the problem of joint modeling of videos and their corresponding textual descriptions (e.g. sentences or phrases). Our approach consists of three components: the video representation, the textual representation, and a joint model that links videos and text. Our video representation uses the state-of-the-art deep 3D ConvNet to capture the semantic information in the video. Our textual representation...
Region-based Image Retrieval (RBIR), which bases itself on image segmentation rather than global features or key-point-based local features, is a branch of Content-based Image Retrieval. This paper proposes a novel RBIR-oriented image segmentation algorithm named Edge Integrated Minimum Spanning Tree (EI-MST). The difference between EI-MST and the traditional MST-based methods is that EI-MST generates...
In general, CNN based semantic segmentation methods assume pixel-wise annotation is available, which is costly to obtain in general. On the other hand, image-level annotations is much easier to obtain than pixel-level annotation. Then, in this work, we focus on weakly-supervised semantic segmentation which is known as task of using training data with only image-level annotations. In this paper, we...
Deep learning-based models have recently been widely successful at outperforming traditional approaches in several computer vision applications such as image classification, object recognition and action recognition. However, those models are not naturally designed to learn structural information that can be important to tasks such as human pose estimation and structured semantic interpretation of...
Although good results for automatic text classification can be achieved with the use of bag-of-words representation, this model is not suitable for all classification problems and richer text representations can be required. In this paper, we proposed two text representation models based on semantic role labels and analyzed them in text classification scenarios. We also evaluated the combination of...
Minimization of discrete energy functions considering higher-order potentials is a challenging yet an important problem. In this work, a three-step procedure will be presented and exemplified on a general problem related to the dense depth map computation from multi-view configurations: Achieving a joint reconstruction of structure and semantics with piecewise planarity constraints. The three steps...
Topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting imagery. These models are confined to crisp segmentation. Yet, there are many images in which some regions cannot be assigned a crisp label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple...
The word embedding models are capable of capturing the semantic content of the textual words. The process of extracting a set of word embedding vectors from a text document is similar to the feature extraction step of the Bag-of-Features pipeline, which is usually used in computer vision tasks. That gives rise to the Bag-of-Embedded Words (BoEW) model. In this paper a novel learning technique that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.