The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Semantic parsing of large-scale 3D point clouds is an important research topic in computer vision and remote sensing fields. Most existing approaches utilize hand-crafted features for each modality independently and combine them in a heuristic manner. They often fail to consider the consistency and complementary information among features adequately, which makes them difficult to capture high-level...
Image is usually taken for expressing some kinds of emotions or purposes, such as love, celebrating Christmas. There is another better way that combines the image and relevant song to amplify the expression, which has drawn much attention in the social network recently. Hence, the automatic selection of songs should be expected. In this paper, we propose to retrieve semantic relevant songs just by...
Virtual face beautification (or markup) becomes common operations in camera or image processing Apps, which is actually deceiving. In this paper, we propose the task of restoring a portrait image from this process. As the first attempt along this line, we assume unknown global operations on human faces and aim to tackle the two issues of skin smoothing and skin color change. These two tasks, intriguingly,...
The ability to ask questions is a powerful tool to gather information in order to learn about the world and resolve ambiguities. In this paper, we explore a novel problem of generating discriminative questions to help disambiguate visual instances. Our work can be seen as a complement and new extension to the rich research studies on image captioning and question answering. We introduce the first...
In this paper we present a novel approach for depth map enhancement from an RGB-D video sequence. The basic idea is to exploit the photometric information in the color sequence. Instead of making any assumption about surface albedo or controlled object motion and lighting, we use the lighting variations introduced by casual object movement. We are effectively calculating photometric stereo from a...
Understanding where people look in images is an important problem in computer vision. Despite significant research, it remains unclear to what extent human fixations can be predicted by low-level (contrast) compared to highlevel (presence of objects) image features. Here we address this problem by introducing two novel models that use different feature spaces but the same readout architecture. The...
Person re-identification is an important task in video surveillance systems. It can be formally defined as establishing the correspondence between images of a person taken from different cameras at different times. In this paper, we present a two stream convolutional neural network where each stream is a Siamese network. This architecture can learn spatial and temporal information separately. We also...
Skeleton-based human action recognition has recently attracted increasing attention due to the popularity of 3D skeleton data. One main challenge lies in the large view variations in captured human actions. We propose a novel view adaptation scheme to automatically regulate observation viewpoints during the occurrence of an action. Rather than re-positioning the skeletons based on a human defined...
Given a video and a description sentence with one missing word, “source sentence”, Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank)...
We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as “super-trajectory”. Each super-trajectory corresponds to a group of compact trajectories that exhibit consistent motion patterns, similar appearance and close spatiotemporal relationships. We generate trajectories using a probabilistic model, which handles occlusions and drifts in...
This paper proposed robust color constancy method for changing illuminant by using local chromaticity distribution and analysis of illuminant influence for each hue angle. First, changing in chromaticity distribution direction for each color with respect to various illuminant is analyzed using principal component analysis. Next, change in standard deviation of chromaticity distribution with respect...
This work presents images encoding and decoding using the theory of conformal mapping. The conformal mapping theory made changes in the domain of problems without modifying physical characteristics between the domains. Images were utilized and are transported between domains using transformation functions like encrypt keys. Developed method showed to be able to preserve original images characteristics...
Blind image quality assessment (BIQA) methods aim to estimate the quality of a given test image without referring to the corresponding reference (original) image. Most BIQA methods use visual sensitivity models, which take into consideration intrinsic image characteristics (e.g. contrast, luminance, and texture) to identify degradations and estimate quality. For example, texture-based BIQA methods...
The popularity of applications using Augmented Reality, especially due to the dissemination of smartphones with high processing power, introduces the need for Fiducial Markers that can be detected quickly, with good accuracy and can deal with partial occlusion. Fiducial Markers can have different shapes, sizes, structure and colors, and are inserted into a scene to facilitate the detection and consequent...
This paper proposes a method of embedding AR (augmented reality) markers in a high-speed video sequence so that they are imperceptible to human eyes. The embedded markers appear for very short periods and keep changing their positions at lightning speed. By carefully designing the timings of marker display, a camera with a sufficiently short exposure time running at any frame rate is able to detect...
One of the important goals of medical augmented reality is to reveal the hidden anatomy, such as a tumor in an organ. However, conveying a hidden tumor's depth to the user effortlessly and precisely is still an unsolved problem. This is especially difficult in monocular laparoscopy. First, the number of available depth cues is in practice limited to only two: occlusion and relative size. Second, exploiting...
With the extensive use of smartphones, technology improving secure communication between smartphones is a growing field of research. As a form of Visible Light Communication, a color video barcode system creates a smartphone-tosmartphone communication channel. This color video barcode system, effectively an evolved form of QR codes, provides a secure alternative to WiFi, Bluetooth, and Near Field...
Face detection is already incorporated in many biometrics and surveillance applications. Therefore, the reduction of false detections is a priority in those systems. However, face detection is still challenging. Many factors, such as pose variation and complex backgrounds, contribute to false detections. Besides, the fidelity of a true detection, measured by precision rate, is a concern in content-based...
Leukemia is a worldwide disease. In this paper we demonstrate that it is possible to build an automated, efficient and rapid leukemia diagnosis system. We demonstrate that it is possible to improve the precision of current techniques from the literature using the description power of well-known Convolutional Neural Networks (CNNs). We extract features from a blood smear image using pre-trained CNNs...
This paper presents a new technique to solve the single image super resolution reconstruction problem based on multiple extreme learning machine regressors, called here MELM. The MELM employs a feature space of low resolution images, divided in subspaces, and one regressor is trained for each one. In the training task, we employ a color dataset containing 91 images, with approximately 5.3 million...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.