The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Classic models of visual attention dramatically fail at predicting eye positions on visual scenes involving faces. While some recent models combine faces with low-level features, none of them consider sound as an input. Yet it is crucial in conversation or meeting scenes. In this paper, we describe and refine an audiovisual saliency model for conversation scenes. This model includes a speaker diarization...
We demonstrate that polarization-sensitive optical coherence tomography (PS-OCT) can identify the cavernous nerve in the human and rat prostate ex vivo based on its birefringence. PS-OCT may be useful for nerve preservation during radical prostatectomy.
The widespread of social networking services allows users to share and quickly spread an enormous amount of digital contents. Currently, a low level of security and trustworthiness is applied to such information, whose reliability cannot be taken for granted due to the large availability of image editing software which allow any user to easily manipulate digital contents. This has a huge impact on...
Photos are excellent means for keeping and refreshing memories. Digital photography, however, imposes new challenges for keeping photos accessible on the long run due to threats such as hard disk crashes, format changes, or storage medium decay. Safe long-term preservation, ensuring the longevity of photos, comes at a cost, suggesting a restriction of this investment to the most valuable photos. Therefore,...
Increasingly large amounts of video data raise the question if large-scale face retrieval is feasible. To find fast and accurate matching strategies, an according face track descriptor is constructed by using local features, extended by an encoding of the respective measurement conditions. The feature encoding allows collecting all features of one face track together in a single feature set, where...
The immense amount of available video data poses novel requirements for video representation approaches by means of focusing on central and relevant aspects of the underlying story and facilitating the efficient overview and assessment of the content. In general, the assessment of content relevance and significance is a high-level task that usually requires for human intervention. However, some filming...
In this paper, we present a novel method that can produce a visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. The main idea of this method is to filter out non-relevant images at a first stage and then cluster the images according to textual descriptors first, and then to visual descriptors...
This article addresses the issue of social image search result diversification. We propose a novel perspective for the diversification problem via Relevance Feedback (RF). Traditional RF introduces the user in the processing loop by harvesting feedback about the relevance of the search results. This information is used for recomputing a better representation of the data needed. The novelty of our...
The previous work has demonstrated that integrating topdown features in bottom-up saliencymethods can improve the saliency prediction accuracy. Therefore, for face images, this paper proposes a saliency detection method based on Gaussian mixture model (GMM), which learns the distribution of saliency over face regions as the top-down feature. Specifically, we verify that fixations tend to cluster around...
Multiview autostereoscopic displays are considered as the future of 3DTV. However, these displays suffer from a high level of crosstalk, which negatively impacts quality of experience (QoE). In this paper, we propose a system to improve 3D QoE on multiview autostereoscopic displays. First, the display is characterized in terms of luminance distribution. Then, the luminance profiles are modeled using...
We propose a method to extract user attributes from the pictures posted in social media feeds, specifically gender information. While traditional approaches rely on text analysis or exploit visual information only from the user profile picture or colors, we propose to look at the distribution of semantics in the pictures coming from the whole feed of a person to estimate gender. In order to compute...
In the contemporary world, new interaction forms for digital games have been developing increasingly. In this work, we present an infinite racing game inspired by the brick game racing. By these interaction entities of the application, called Interacts, and using hear cascade through OpenCV, we extend the possibilities for controlling this game. Thus, beyond keyboard and mouse, it is also possible...
Assessment the quality of segmentation algorithms considering the user perception is an important problem in Computer Vision. For this purpose a metric must take into account the impact of different types of errors displayed to the users. In this work we developed a new objective metric to assess the quality obtained by bilayer segmentation algorithms when they are used in Augmented Reality applications...
Despite their impact on computer vision and face recognition, the inner workings of deep convolutional neural networks (CNNs) have traditionally been regarded as uninterpretable. We demonstrate this to be false by proposing prediction gradients to understand how neural networks encode concepts into individual units. In constrast, existing efforts to understand convolutional nets focus on visualizing...
The ability to automatically detect eye center locations in video images allows for estimating gaze direction. This, in turn, facilitates the study of human-computer interaction and behavioral analyses of social interactions. We propose an improved eye center localization method based on the Hough transform, called Circle-based Eye Center Localization (CECL) that is simple, robust, and achieves accuracy...
Recent research has demonstrated that computer vision algorithms have understood individual face image fairly well. However, one major challenge in computer vision is to go beyond that and to investigate the bi-or tri- relationship among multiple visual entities, answering such questions as whether a child in a photo belongs to given parents. Indeed parents-child relationship plays a core role in...
High quality face image acquisition from huge video data obtained in visual sensor network is of great significance in applications related to face processing, such as face recognition and reconstruction. This paper proposes an optimal face image acquisition method in visual sensor network, which is based on collaborative face frames acquisition and heterogeneous feature fusion-based face quality...
We present a novel method, Foveated Manifold Sensing, for the adaptive and efficient sensing of the visual world. The method is based on algorithms that learn manifolds of increasing but low dimensionality for representative data. As opposed to Manifold Sensing, the new foveated version senses only the most salient areas of a scene. This leads to an efficient sensing strategy that requires only a...
In this paper, we introduce a novel flow visualization technique for arbitrary surfaces. This new technique utilizes the closest point embedding to represent the surface, which allows for accurate particle advection on the surface as well as supports the unsteady flow line integral convolution (UFLIC) technique on the surface. This global approach is faster than previous parameterization techniques...
Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject's talking is firstly recorded...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.