The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper focuses on real-time pedestrian detection on Field Programmable Gate Arrays (FPGAs) using the Histograms of Oriented Gradients (HOG) descriptor in combination with a Support Vector Machine (SVM) for classification as a basic method. We propose to process image data at twice the pixel frequency and to normalize blocks with the L1-Sqrt-norm resulting in an efficient resource utilization....
In recent years, with the advent of cheap and accurate RGBD (RGB plus Depth) active sensors like the Microsoft Kinect and devices based on time-of-flight (ToF) technology, there has been increasing interest in 3D-based applications. At the same time, several effective improvements to passive stereo vision algorithms have been proposed in the literature. Despite these facts and the frequent deployment...
Although graph cuts (GC) is popularly used in many computer vision problems, slow execution time due to its high complexity hinders wide usage. Manycore solution using Graphics Processing Unit (GPU) may solve this problem. However, conventional GC implementation does not fully exploit GPU's computing power. To address this issue, a new GC algorithm which is suitable for GPU environment is presented...
In this paper we present a new approach for the evaluation of event-based Silicon Retina stereo matching results. The evaluation of stereo matching algorithm results is a necessary task for the development, comparison, and improvement of depth generating camera systems. In contrast to conventional frame-based cameras, the silicon retina sensors delivers asynchronous events instead of synchronous intensity...
We aim at designing interactive playgrounds that automatically analyze the behavior of children while playing games, in order to adapt the gameplay and make the games more engaging. In this paper, we focus on recognizing roles in tag games, where children are taggers or runners. We start by tracking the location and motion of individual players, and subsequently recognize pairwise interactions: approach,...
A typical gaming scenario involves a player interacting with a game by a specialized input device, such as a joystic, a mouse, a keyboard etc. Recent technological advances have enabled the introduction of more elaborated approaches in which the player is able to interact with the game using his/her body pose, facial expressions, actions, even his physiological signals (heart beat rate, encephalogram,...
Action recognition is an important component in human-machine interactive systems and video analysis. Besides low-level actions, temporal relationships are also important for many actions, which are not fully studied for recognizing actions. We model the temporal structure of low-level actions based on dense trajectory groups. Trajectory groups are a higher level and more meaningful representation...
The detection and classification of human movements, as a joint field of Computer Vision and Pattern Recognition, is used with an increasing rate in applications designed to describe human activity. Such applications require efficient methods and tools for the automatic analysis and classification of motion capture data, which constitute an active field of research. To facilitate the development and...
The use of 3D technologies to represent elements and interact with them is an open and interesting research area. In this article we discuss a novel human computer interaction method that integrates mobile computing and 3D visualization techniques with applications on free viewpoint visualization and 3D rendering for interactive and realistic environments. Especially this approach is focused on augmented...
Training vision-based pedestrian detectors using synthetic datasets (virtual world) is a useful technique to collect automatically the training examples with their pixel-wise ground truth. However, as it is often the case, these detectors must operate in real-world images, experiencing a significant drop of their performance. In fact, this effect also occurs among different real-world datasets, i...
In this paper we present a flash game that aims at generating easily ground truth for testing object detection algorithms. Flash the Fish is an online game where the user is shown videos from underwater environments and has to take photos of fish by clicking on them. The initial ground truth is provided by object detection algorithms and, subsequent, cluster analysis and user evaluation techniques,...
In this paper we present and start analyzing the iCub World data-set, an object recognition data-set, we acquired using a Human-Robot Interaction (HRI) scheme and the iCub humanoid robot platform. Our set up allows for rapid acquisition and annotation of data with corresponding ground truth. While more constrained in its scopes -- the iCub world is essentially a robotics research lab -- we demonstrate...
Among the components of a pedestrian detector, its trained pedestrian classifier is crucial for achieving the desired performance. The initial task of the training process consists in collecting samples of pedestrians and background, which involves tiresome manual annotation of pedestrian bounding boxes (BBs). Thus, recent works have assessed the use of automatically collected samples from photo-realistic...
This paper investigates the use of synthetic 3D scenes to generate ground truth of pedestrian segmentation in 2D crowd video data. Manual segmentation of objects in videos is indeed one of the most time-consuming type of assisted labeling. A big gap in computer vision research can not be filled due to this lack of temporally dense and precise segmentation ground truth on large video samples. Such...
We have been researching three dimensional (3D) ground-truth systems for performance evaluation of vision and perception systems in the fields of smart manufacturing and robot safety. In this paper we first present an overview of different systems that have been used to provide ground-truth (GT) measurements and then we discuss the advantages of physically-sensed ground-truth systems for our applications...
The development of vehicles that perceive their environment, in particular those using computer vision, indispensably requires large databases of sensor recordings obtained from real cars driven in realistic traffic situations. These datasets should be time shaped for enabling synchronization of sensor data from different sources. Furthermore, full surround environment perception requires high frame...
Evaluating multi-target tracking based on ground truth data is a surprisingly challenging task. Erroneous or ambiguous ground truth annotations, numerous evaluation protocols, and the lack of standardized benchmarks make a direct quantitative comparison of different tracking approaches rather difficult. The goal of this paper is to raise awareness of common pitfalls related to objective ground truth...
We present a new approach to the collection and labeling of ground truth data for annotation of temporal events in ad-hoc videos taken by active operators recording interactions and activities in the field. We present experimental data and related research from experimental psychology which indicate that the conventional methodology based on asking annotators to pick a single instance in time for...
People are often seen together. We use this simple observation to provide crucial additional information and increase the robustness of a video tracker. The goal of this paper is to show how, in situations where offline training data is not available, a social behavior model (SBM) can be inferred online and then integrated within the tracking algorithm. We start with tracklets (short term confident...
We propose a learning-based method for detecting carried objects that generates candidate image regions from protrusion, color contrast and occlusion boundary cues, and uses a classifier to filter out the regions unlikely to be carried objects. The method achieves higher accuracy than state of the art, which can only detect protrusions from the human shape, and the discriminative model it builds for...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.