The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, a system to aid the visually impaired by providing contextual information of the surroundings using 360° view camera combined with deep learning is proposed. The system uses a 360° view camera with a mobile device to capture surrounding scene information and provide contextual information to the user in the form of audio. The scene information from the spherical camera feed is classified...
Nowadays, visual features play a key role, as they can provide a concise representation of visual data that is efficient for multiple tasks, notably content retrieval and object recognition. In parallel, visual sensors have been improving, targeting richer acquisitions of the light in a visual scene. In this context, the so-called light field cameras, which have recently emerged, are able to go beyond...
An inertial-aided visual servo control approach for fully-actuated Autonomous Underwater Vehicles (AUVs) without relying on linear velocity measurements is proposed. The homography obtained from corresponding images of a locally planar scene is directly exploited as feedback information. A cascade inner-outer loop control architecture is adopted that facilitates both control implementation and gain...
In this paper, a temporally iterative Gaussian Mixture Model (GMM) of Dynamic Texture (DT) for target detection using a moving PTZ camera, is proposed. Camera movement in a PTZ sensor causes motion-based target detection techniques to fail for the periods affected by the scene change. This is because the whole scene is considered a representation of the target motion. When the camera is in motion,...
In this paper, a robust visual tracking system with occlusion handling is proposed to track the target with real-time performance. The thermal camera, which can observe the heat originated from the target such as the human body or vehicle, can collaborate with the color camera to track the target in the cluttered environment or under occlusion. Unlike the general tracking by using the color camera...
For a visual servo system, there usually exists the problem of time-delay likely caused by long image processing and data transmission. The visual servo system of our robot is subject to two main limitations stemming from the specific commercial mobile manipulator, of which one is the large time-delay due to image transmission, whereas the other is failure to directly command each joint velocity as...
Tangible User Interfaces (TUI) have garnered significant interest in the past years as a potential solution to embed smarter technologies for education. The intrinsic ability of this technology to engage and intrigue students in active learning pedagogies has recently been successfully proven across all ages using various techniques. Predominantly amongst the effective technologies, has been the development...
Monocular visual odometry algorithm has been widely used to estimate the pose of aerial robots in GPS denied environments. However, the pure visual system usually has poor robustness in large scale environments. This paper presents a pose estimation algorithm which fuses monocular visual and inertial data using the monocular ORB-SLAM algorithm as the visual framework. Firstly, the scale estimation...
Person Re-Identification (person re-id) is a crucial task as its applications in visual surveillance and human-computer interaction. In this work, we present a novel joint Spatial and Temporal Attention Pooling Network (ASTPN) for video-based person re-identification, which enables the feature extractor to be aware of the current input video sequences, in a way that interdependency from the matching...
In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we...
Detecting logo frequency and duration in sports videos provides sponsors an effective way to evaluate their advertising efforts. However, general-purposed object detection methods cannot address all the challenges in sports videos. In this paper, we propose a mutual-enhanced approach that can improve the detection of a logo through the information obtained from other simultaneously occurred logos...
In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine visual...
Anticipating human intention by observing one’s actions has many applications. For instance, picking up a cellphone, then a charger (actions) implies that one wants to charge the cellphone (intention) (Fig. 1). By anticipating the intention, an intelligent system can guide the user to the closest power outlet. We propose an on-wrist motion triggered sensing system for anticipating daily intentions,...
In view of the special conditions of gas drainage boreholes in coal mines, an automatic monitoring system of gas drainage borehole is designed, which consists of operation control, image acquisition, image transmission, record demonstration, electricity supply and explosion protection. According to the voltage level in coal mine, a three-stage transformer with wide voltage range and stable operation...
UAVs (Unmanned Aerial Vehicles) have been widely used in power line inspections, but low autonomous cruise capacity of UAVs requires strict condition for operators and site while landing during UAV power line inspections. This paper presents an autonomous landing control technique for UAVs when charging at the electric towers based on vision positioning method. The proposed system consists of three...
This paper presents a novel strategy addressing visual SLAM with enhancement of data association method. Hyper graph theory and transformation was incorporated within cooperative visual SLAM. The research presented a synthetic approach to fulfill a cooperative data association and fusion strategy for multiple UAVs equipped with stereo vision cameras encountered with indistinct imaging, where conventional...
We present an approach of estimating constrained egomotion on a Pixel Processor Array (PPA). These devices embed processing and data storage capability into the pixels of the image sensor, allowing for fast and low power parallel computation directly on the image-plane. Rather than the standard visual pipeline whereby whole images are transferred to an external general processing unit, our approach...
Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc annotations...
Given an image of a street scene in a city, this paper develops a new method that can quickly and precisely pinpoint at which location (as well as viewing direction) the image was taken, against a pre-stored large-scale 3D point-cloud map of the city. We adopt the recently developed 2D-3D direct feature matching framework for this task [23,31,32,42–44]. This is a challenging task especially for large-scale...
This paper presents a method to assess a basketball player's performance from his/her first-person video. A key challenge lies in the fact that the evaluation metric is highly subjective and specific to a particular evaluator. We leverage the first-person camera to address this challenge. The spatiotemporal visual semantics provided by a first-person view allows us to reason about the camera wearer's...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.