The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose...
This paper addresses the problem of joint detection and recounting of abnormal events in videos. Recounting of abnormal events, i.e., explaining why they are judged to be abnormal, is an unexplored but critical task in video surveillance, because it helps human observers quickly judge if they are false alarms or not. To describe the events in the human-understandable form for event recounting, learning...
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep...
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework,...
Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose...
Inspired by the recent success of text-based question answering, visual question answering (VQA) is proposed to automatically answer natural language questions with the reference to a given image. Compared with text-based QA, VQA is more challenging because the reasoning process on visual domain needs both effective semantic embedding and fine-grained visual understanding. Existing approaches predominantly...
Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative...
Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) — a new architecture...
Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred...
This paper presents the design, analyses, and fabrication of a tank-like wall-climbing robot using gecko-inspired dry adhesives. The robot uses customized timing adhesive belts, which is flexible as well as patterned using MEMS techniques. The Kendall strip tape model is modified, considering features of the timing belt, to analyze the peeling process of the viscoelastic tread. The relationship between...
This paper proposes the design and experiment of a bioinspired wall-climbing robot with spiny arrays. Inspired by the Serica orientalis Motschulsky's tarsal system, a spiny structure is designed, and the robot's foot which has two grippers using the structure is designed. An inchworm-like gait is employed and its trajectory is planned. The robot's foot as well as the whole prototype is fabricated...
This paper presents a novel representative-based framework for parsing and summarizing events in long surveillance videos. The proposed framework first extracts object blob sequences and utilizes them to represent events in a surveillance video. Then, a sequence filtering strategy is introduced which detects and eliminates noisy blob sequences based on their spatial and temporal characteristics. After...
Digital storytelling applications are playing an increasingly important role in people's daily life. In contemporary storytelling applications such as PowerPoint presentation and macro/micro blogs, good presentation images are always highly desired by content creators to boost their presentation in an intuitive and attractive way. Existing studies, however, have not yet addressed the challenging problem...
Traditional bridge crack detection methods are of high cost and high risk. We propose a bridge crack detection and classification method based on a climbing robot using image analysis with a miniature camera mounted on the robot to collect images. First, the motion blur of acquired image is removed by Wiener filtering method. Second, wavelet transform is used to enhance fracture of the crack in the...
Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and...
The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user's major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose...
Video concept learning often requires a large set oftraining samples. In practice, however, acquiring noise-free training labels with sufficient positive examples is very expensive. A plausible solution for training data collection is by sampling from the vast quantities of images and videos on the Web. Such a solution is motivated by the assumption that the retrieved images or videos are highly correlated...
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos...
Video is a kind of structured data with multi-layer (ML) information, e.g., a shot is consisted of three layers including shot, key-frame, and region. Moreover, multi-instance (MI) relation is embedded along the consecutive layers. Both the ML structure and MI relation are essential for video concept detection. The previous work [5] dealt with ML structure and MI relation by constructing a MLMI kernel...
The paper proposed a method for a quadruped robot control system based Central Pattern Generator (CPG) and fuzzy neural networks (FNN). The common approach for the control of a quadruped robot includes two methods mainly. One is the CPG that is based the bionics, the other is the dynamic control that is based the model of quadruped robot. The control result of CPG is decided by the gait data of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.