The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Scene classification is a key problem in the interpretation of high-resolution remote sensing imagery. The state-of-the-art methods, e.g. bag-of-visual-words model and its various extensions as well as the topic models, share similar procedures: patch sampling, feature description/learning and classification. Patch sampling is the first and the key procedure which has a great influence on the results...
Scene classification for high-resolution remotely sensed imagery have been widely investigated in recent years. However, there is few public, widely accepted and large scale dataset for benchmarking different methods. This paper presents a new and large dataset consisting of 5000 high-resolution remote sensing images which is manually labeled in 20 semantic classes for scene classification. Each class...
An approach to create fixation density maps(FDM) for stereoscopic images is proposed in this paper, overcoming the shortages of current methods. A new representation of stereoscopic images like Computed Tomography(CT) is used, which can show more information such as depth and discomfort zone. Apart from this, we follow the protogenetic 2D calibration of eyetracker by a 3D offline calibration to gain...
Hand gesture based input method has quickly emerged as an alternative way for human-3DTV interaction. However, it limits the user experience and gets severe when gesture recognition in uncontrolled TV room is not accurate or robust enough and a large type and number of gestures are required. In this paper, we present a simple and fast human-3DTV interaction method that combines the advantages of touchless...
Fall is a leading cause of accidental injury deaths and a key cause of significant health problems, especially for elderly people who live alone. To assist those people for seeking help when falling and keeping records of key daily movements, we propose a simple yet effective system to monitor the daily activities and in-house locations using smartphone. We also test the system for the optimum arrangement...
The precision of visual matching and the trade-off between accuracy and time efficiency have long been bottlenecks of image search systems. This work addresses the two problem simultaneously by introducing the coupled Multi-Index (cMI) structure. First, by combining SIFT and color features on the indexing-level, the discriminative power of visual words is greatly enhanced. Second, by reducing the...
This study to explore the eye movement patterns with underlying cognitive process of optic reasoning between science and non-science major students that have different prior knowledge of optical concepts. There are 33 science major and 33 non-science major undergraduate students were involved in this study. The results showed the science major students and non-science major students have improved...
The bag of visual words is a well established representation in diverse computer vision problems. Taking inspiration from the fields of text mining and retrieval, this representation has proved to be very effective in a large number of domains. In most cases, a standard term-frequency weighting scheme is considered for representing images and videos in computer vision. This is somewhat surprising,...
This paper presents a subject centric group feature for person re-identification. Our approach is inspired by the observation that people often tend to walk alongside others or in a group. We argue that co-travelers' information, including geometry and visual cues, can reduce the re-identification ambiguity and lead to better accuracy, compared to approaches that rely only on visual cues. We introduce...
The performance of different action recognition techniques has recently been studied by several computer vision researchers. However, the potential improvement in classification through classifier fusion by ensemble-based methods has remained unattended. In this work, we evaluate the performance of an ensemble of action learning techniques, each performing the recognition task from a different perspective...
Large amounts of available training data and increasing computing power have led to the recent success of deep convolutional neural networks (CNN) on a large number of applications. In this paper, we propose an effective semantic pixel labelling using CNN features, hand-crafted features and Conditional Random Fields (CRFs). Both CNN and hand-crafted features are applied to dense image patches to produce...
Learning to count is a learning strategy that has been recently proposed in the literature for dealing with problems where estimating the number of object instances in a scene is the final objective. In this framework, the task of learning to detect and localize individual object instances is seen as a harder task that can be evaded by casting the problem as that of computing a regression value from...
Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image representation by combining the two approaches. To take advantages of both representations, we extract a fair...
In this paper, we evaluate the generalization power of deep features (ConvNets) in two new scenarios: aerial and remote sensing image classification. We evaluate experimentally ConvNets trained for recognizing everyday objects for the classification of aerial and remote sensing images. ConvNets obtained the best results for aerial images, while for remote sensing, they performed well but were outperformed...
Controlling absolute magnitudes of fingertip force is an important skill in many haptic interactions such as surgical operations and mechanical assemblies. A fundamental question in the force control is how quickly human can output a target force with expected accuracy. In this paper, human's capability to control absolute magnitudes of fingertip force under audio or visual feedback was observed through...
In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare and evaluate leading vision-based and text-based technologies on a new very large multimodal dataset (UPMC Food-101) containing about 100,000 recipes for a total of 101 food categories. Each item in this dataset is represented by one image plus textual information. We present deep experiments of recipe...
Human action recognition is at the core of computer vision, and has great application value in intelligent human-computer interactions. On the basis of Bag-of-Words (BoW), this work presents a Huffman coding and Implicit Action Model (IAM) combined framework for action recognition. Specifically, Huffman coding, which outperforms naïve Bayesian method, is a robust estimation of visual words' conditional...
This paper demonstrates a high performance brain-computer interface (BCI) that allows users to dial phone numbers. The system is based on Canonical Correlation Analysis (CCA) and Steady-State Visual Evoked Potential (SSVEP). Through six buttons (9Hz, 10Hz, 11Hz, 12Hz, 13 Hz, 14Hz) displayed on the screen, subjects can choose the number by gazing at the computer interface. This proposed EEG (Electroencephalography)...
We present PET- the Pascal animal classes Eye Tracking database. Our database comprises eye movement recordings compiled from forty users for the bird, cat, cow, dog, horse and sheep trainval sets from the VOC 2012 image set. Different from recent eye-tracking databases such as [1, 2], a salient aspect of PET is that it contains eye movements recorded for both the free-viewing and visual search task...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.