The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a novel approach for iris dissimilarity computation based on Machine Learning paradigms and Computer Vision transformations. Based on the training dataset given by the MICHE II Challenge organizers, a set of classifiers has been constructed and tested, aiming at classifying a single image.
Diabetic retinopathy is when damage occurs to the retina due to diabetes, which affects up to 80 percent of all patients who have had diabetes for 10 years or more. The expertise and equipment required are often lacking in areas where diabetic retinopathy detection is most needed. Most of the work in the field of diabetic retinopathy has been based on disease detection or manual extraction of features,...
In this paper, we address the problem of recognizing unfinished human activity from partially observed videos. Specifically, we propose a novel human activity descriptor, which can represent pairwise relationships among human activities in a compact manner using pre-trained Convolutional Neural Networks (CNNs) by capturing the discriminative sub-volume. The potentially important relationship among...
In this paper, we focus on training a classifier from large-scale data with incompletely assigned labels. In other words, we treat samples with following properties: 1. assigned labels are definitely positive, 2. absent labels are not necessarily negative, and 3. samples are allowed to take more than one label. These properties are frequently found in various kinds of computer vision tasks, including...
We consider the problem of joint modeling of videos and their corresponding textual descriptions (e.g. sentences or phrases). Our approach consists of three components: the video representation, the textual representation, and a joint model that links videos and text. Our video representation uses the state-of-the-art deep 3D ConvNet to capture the semantic information in the video. Our textual representation...
The development of automatic nutrition diaries, which would allow to keep track objectively of everything we eat, could enable a whole new world of possibilities for people concerned about their nutrition patterns. With this purpose, in this paper we propose the first method for simultaneous food localization and recognition. Our method is based on two main steps, which consist in, first, produce...
In the past years, deep convolutional neural networks (CNNs) have become extremely popular in the computer vision and pattern recognition community. The computational power of modern processors, efficient stochastic optimization algorithms, and large amounts of training data allowed training complex tasks-specific features directly from the data in an end-to-end fashion, as opposed to the traditional...
We present an approach to automatically generating verbal commentaries for tennis games. We introduce a novel application that requires a combination of techniques from computer vision, natural language processing and machine learning. A video sequence is first analysed using state-of-the-art computer vision methods to track the ball, fit the detected edges to the court model, track the players, and...
Current best local descriptors are learned on a large dataset of matching and non-matching keypoint pairs. However, data of this kind is not always available since detailed keypoint correspondences can be hard to establish. On the other hand, we can often obtain labels for pairs of keypoint bags. For example, keypoint bags extracted from two images of the same object under different views form a matching...
In this paper we propose a publicly available static hand pose database called OUHANDS and protocols for training and evaluating hand pose classification and hand detection methods. A comparison between the OUHANDS database and existing databases is given. Baseline results for both of the protocols are presented.
In this paper, we propose a mutual framework that combines two state-of-the-art visual object tracking algorithms. Both trackers benefit from each other's advantage leading to an efficient visual tracking approach. Many state-of-the-art trackers have poor performance due to rain, fog or occlusion in real-world scenarios. Often, after several frames, objects are getting lost, only leading to a short-term...
Optical flow is one of the key components in computer vision research area. Since the seminal work proposed by Horn and Schunck [1], numerous advanced algorithms have been proposed. Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. However, despite their major advances over last decade, conventional optical flow methods...
Convolutional Neural Network (CNN) has been used successfully in solving different computer vision tasks such as classification, detection, and segmentation. This paper addresses the problem of estimating object depth from a single RGB image. While stereo depth estimation is a straightforward task, predicting depth map of an object from a single RGB image is a more challenging task due to the lack...
Human motion capture techniques (MOCAP) are widely applied in many areas such as computer vision, computer animation, digital effect and virtual reality. Even with professional MOCAP system, the acquired motion data still always contains noise and outliers, which highlights the need for the essential motion refinement methods. In recent years, many approaches for motion refinement have been developed,...
In this paper we are interested in knowing, which features provide useful information for recognizing a gesture or an action, and how the set of selected characteristics impact the accuracy of detection. Then we define a large set of possible features, which are angles calculated from the joints of the skeleton provided by the kinect device. Our contribution is to propose an algorithm: Reduction of...
In this paper we evaluate the performance of CNN in regards to face recognition for real world applications. In recent years, many high performance deep neural networks have been proposed to the face recognition world. These deep networks were trained by images provided by the internet, and they commonly are of good quality when facial expression and posture are not particularly complex. However,...
Pedestrian detection is an active problem in computer vision research, with applications in robotics, self-driving cars and surveillance. It involves generating bounding boxes to indicate the location of every pedestrian in an input image. This paper proposes a method to augment a basic pedestrian detector with a Convolutional Neural Network. An implementation of the proposed algorithm was trained...
With the development of image processing and computer vision technology, using gesture to communicate with the machine will not only appear in scientific move or just a conceptual product. Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. With this, we can have a more convenient life. Therefore, our...
The state-of-the-art image classification methods require an intensive learning stage and a considerable amount of training images. Recently, with the introduction of these models (and in particular convolutional neural network (CNN)), it is believed that the best solution to achieve a system with high performance on scene classification is to learn deep scene features using CNN. While this can be...
We present a novel and real-time method to detect object affordances from RGB-D images. Our method trains a deep Convolutional Neural Network (CNN) to learn deep features from the input data in an end-to-end manner. The CNN has an encoder-decoder architecture in order to obtain smooth label predictions. The input data are represented as multiple modalities to let the network learn the features more...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.