The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Augmented reality is becoming more and more popular due to the countless number of practical applications. A key element is the understanding of the scene and the involved human activities to be able to offer a rich interaction with the world via virtual actions and elements. For this purpose, a new vision-based human-action recognition module has been developed to be integrated with the new generation...
We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only from the user profile picture or colors, we propose to look at the distribution of semantics in the pictures coming from the whole feed of a person to estimate gender. In order to compute...
In the future robotic applications, robot requires the ability not only to recognize human actions but also to learn incrementally and quickly. Therefore, we proposed an incremental action learning system for this future requirement. The proposed system can continuously learn new actions quickly with robust performance and less effort.
The capability of applying a weak force with expected accuracy is an important motor skill in surgical operations. Acquiring such a skill is challenging for novices. In this paper, we studied how the accuracy of the force control could be enhanced through repetitive training. Twelve participants were divided into two groups. They were trained to apply a target force of 0.25N with ±20% accuracy under...
In this paper, we propose a framework to recognize complex human interactions. First, we adopt trajectories to represent human motion in a video. Then, the extracted trajectories are clustered into different groups (named as local motion patterns) using the coherent filtering algorithm. As trajectories within the same group exhibit similar motion properties (i.e., velocity, direction), we adopt the...
Interacting with the environment using mobile eye-tracking is accompanied with challenges in providing non-visual feedback related to gaze events and monitoring the gaze vector estimation quality. Recent studies point to haptic stimulation as a promising feedback channel in this context. In this work we focused on applying haptic stimulation to inform users of pointing inaccuracies by cuing their...
The Document Object Model (DOM) provides a tree structure called DOM tree for representing with objects in HTML. Many researchers have considered using leaf nodes of DOM tree as basic objects in extracting information from web pages. However, web pages are more of information blocks which each have a consistent visual format rather than individual DOM tree nodes. And those information blocks do not...
In this paper, we tackle the task of recognizing types of partly very similar identity documents using state-of-the-art visual recognition approaches. Given a scanned document, the goal is to identify the country of issue, the type of document, and its version. Whereas recognizing the individual parts of a document with known standardized layout can be done reliably, identifying the type of a document...
This paper presents a novel visual analysis based framework for automated planogram compliance check in retail stores. Our framework provides an efficient and convenient solution for ensuring planogram compliance by real-time analysis of the shelf image acquired in freehand manner. We present a novel application of Hausdorff metric for occupancy computation in product shelf images. Subsequently, we...
We present a novel audiovisual emotion recognition solution using multimodal information fusion based on entropy estimation. Considering the limitations of existing methods, we propose a new dual-level fusion framework which consists of feature level fusion module based on kernel entropy component analysis and score level fusion module based on maximum correntropy criterion. In our system, audio and...
Fashion is a major segment in e-commerce with growing importance and a steadily increasing number of products. Since manual annotation of apparel items is very tedious, the product databases need to be organized automatically, e.g. by image classification. Common image classification approaches are based on features engineered for general purposes which perform poorly on specific images of apparel...
In this paper, we propose a method to extend the field of view (FoV) of cameras mounted on Micro Aerial Vehicles (MAVs). The idea is to stitch together appropriate sections of the panorama to the camera frame. The proposed system efficiently performs view extension by fusing fast tracking and feature descriptor matching into the stitching algorithm. The quality of the extended view is further improved...
Classification of digital images into photographs and various kinds of non-photographic images has not been sufficiently studied but has many applications such as retrieval of real scene photographs from web sites and image databases. In this paper, we show that the combination of Bag of Visual Words of SURF features and histograms of LBPs for HSV and Luminance components (SURF+LBP(HSVL)) is simple,...
The ability to automatically detect eye center locations in video images allows for estimating gaze direction. This, in turn, facilitates the study of human-computer interaction and behavioral analyses of social interactions. We propose an improved eye center localization method based on the Hough transform, called Circle-based Eye Center Localization (CECL) that is simple, robust, and achieves accuracy...
As robots continue to create long-term maps, the amount of information that they need to handle increases over time. In terms of place recognition, this implies that the number of images being considered may increase until exceeding the computational resources of the robot. In this paper we consider a scenario where, given multiple independent large maps, possibly from different cities or locations,...
This work presents an automatic calibration method for a vision based external underwater ground-truth positioning system. These systems are a relevant tool in benchmarking and assessing the quality of research in underwater robotics applications. A stereo vision system can in suitable environments such as test tanks or in clear water conditions provide accurate position with low cost and flexible...
Autonomous docking for airborne energy transfer is an important unmanned aerial vehicle capability that has yet to be accomplished. The implications of this technology are far reaching because vehicle endurance can be significantly extended without requiring additional onboard energy storage or environmental energy collection. A major barrier to docking with a drogue is reliable and accurate knowledge...
This work presents a method for increasing the accuracy of standard visual inertial odometry (VIO) by effectively removing the angular drift that naturally occurs in feature-based VIO. In order to eliminate such drift, we propose to leverage the predominance of parallel lines in man-made environments by using the intersection of their image projections, known as vanishing points (VPs). First, an efficient...
In this paper, we introduce an Iterative Kalman Smoother (IKS) for tracking the 3D motion of a mobile device in real-time using visual and inertial measurements. In contrast to existing Extended Kalman Filter (EKF)-based approaches, smoothing can better approximate the underlying nonlinear system and measurement models by re-linearizing them. Additionally, by iteratively optimizing over all measurements...
The location information of interest points is an important cue for action recognition. In order to model the spatio-temporal distribution, we propose a novel position feature which is constructed by normalized pairwise relative positions of points. Promising performance has been achieved by Vector of Locally Aggregated Descriptors (VLAD) which gather the differences between descriptors and visual...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.