The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been fully explored or modeled. Here, we combine low-level features such as orientation, color, intensity, saliency...
Fine-grained categorization refers to the task of classifying objects that belong to the same basic-level class (e.g. different bird species) and share similar shape or visual appearances. Most of the state-of-the-art basic-level object classification algorithms have difficulties in this challenging problem. One reason for this can be attributed to the popular codebook-based image representation,...
Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints that are not yet well utilized. We propose a framework to incorporate the context constraints for detection...
We present a video summarization approach for egocentric or “wearable” camera data. Given hours of video, the proposed method produces a compact storyboard summary of the camera wearer's day. In contrast to traditional keyframe selection techniques, the resulting summary focuses on the most important objects and people with which the camera wearer interacts. To accomplish this, we develop region cues...
We introduce a saliency model based on two key ideas. The first one is considering local and global image patch rarities as two complementary processes. The second one is based on our observation that for different images, one of the RGB and Lab color spaces outperforms the other in saliency detection. We propose a framework that measures patch rarities in each color space and combines them in a final...
Many works in computer vision attempt to solve different tasks such as object detection, scene recognition or attribute detection, either separately or as a joint problem. In recent years, there has been a growing interest in combining the results from these different tasks in order to provide a textual description of the scene. However, when describing a scene, there are many items that can be mentioned...
Supervoxel segmentation has strong potential to be incorporated into early video analysis as superpixel segmentation has in image analysis. However, there are many plausible supervoxel methods and little understanding as to when and where each is most appropriate. Indeed, we are not aware of a single comparative study on supervoxel segmentation. To that end, we study five supervoxel algorithms in...
Many cues have been proposed for contour detection or image segmentation. These include low-level image gradients to high-level information such as the identity of the objects in the scene or 3D depth understanding. While state-of-the-art approaches have been incorporating more cues, the relative importance of the cues is unclear. In this paper, we examine the relative importance of low-, mid- and...
Curve fragments, as opposed to unorganized edge elements, are of interest and use in a large number of applications such as multiview reconstructions, tracking, motion-based segmentation, and object recognition. A large number of contour grouping algorithms have been developed, but progress in this area has been hampered by the fact that current evaluation methodologies are mainly edge-based, thus...
Currently, most authentication system requires users to answer the CAPTCHA (Completely Automated Public Turing Test to Tell Computer and Human Apart) before gaining the system access. CAPTCHA is a standard security technology for distinguish between human and computer program automatically. The problem of using CAPTCHA is the difficulty of reading the text-based presentation, or interpreting the image-based...
We present an approach to automatically learn the visual appearance of an environment in terms of object classes. The procedure is totally unsupervised, incremental, and can be executed in real time. The traversability property of an unseen object is also learnt without human supervision by the interaction between the robot and the environment. An incremental version of affinity propagation, a state-of-the-art...
It is crucial to get human hand information for hand gesture recognition tasks. However, at present, people can not still get a perfect hand segmentation or localize hand accurately especially under complex conditions. Therefore, it is necessary to develop robust and effective methods for detecting human hand accurately. In this paper, we propose a new method for hand detection. We present an extended...
In this paper, a novel sparse feature representation method for object tracking is proposed. The method is on the observation that a tracked object can be dynamically and compactly represented by a few features (sparse representation) from a large feature set (the improved histogram of oriented gradient and color, HOGC). Based on the HOGC features, the sparse representation can be learned online from...
This paper presents a unified probabilistic framework to tackle two closely related visual tasks: pedestrian segmentation and pose tracking along monocular videos. Although the two tasks are complementary in nature, most previous approaches focus on them individually. Here, we resolve the two problems simultaneously by building and inferring a single body model. More specifically, pedestrian segmentation...
Detecting pedestrian accurately from natural scenes makes the important impact on intelligent video surveillance. In this paper, we combine motion information, human skin color information, human shape information and variation of ambient lighting to detect pedestrians for the application of automated video surveillance. The moving objects in the video sequence images are extracted using the multi-frame...
In this paper, we describe a NAO H25 humanoid robot painter assisted by a human. The aim of this study is to reproduce the whole painting process by a humanoid robot with a vision system and fingers. The novelty of the study lies in using a human assistant in interaction with the robot and filling regions in the picture. The painting process is performed by the humanoid robot in three phases: obtaining...
In this paper, we discuss the RoboCup@Home league as a benchmark for service robot systems in everyday environments. The competition requires skills in mobile manipulation and human-robot interaction. Specifically, we detail the contributions of our team NimbRo, which won the RoboCup@Home competition in 2011. We demonstrated novel capabilities in the league such as real-time table-top segmentation,...
This paper is an attempt to explore a human element not easily solved in the image processing communities. The problem statement is vague but important to address. What is a good image? More specifically, if a low contrast image is presented, at what level of enhancement is good enough for a human observer? This of course depends on diverse elements, e.g., personal preference, emotional state, physical...
This paper presents a new approach to enhancing the text readability of the quad RGBW color electrophoretic display (EPD). In the color EPD, text characters are jagged due to its low resolution and the jaggedness degrades the readability of the text. However, text characters are usually black-and-white, and for the black-and-white character, it is possible to improve readability by relocating the...
The main purpose of this paper is to make table tennis robots complete the hit table tennis ball action by imitating human's behavior. The main strategy is to record a video of action which people played the table tennis, then analysis the video of the racket trajectory. The racket in the image is extracted by image processing when the each frame is captured in the video. Then three-dimensional coordinates...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.