The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The task of visual relationship recognition (VRR) is recognizing multiple objects and their relationships in an image. A fundamental difficulty of this task is class-number scalability, since the number of possible relationships we need to consider causes combinatorial explosion. Another difficulty of this task is modeling how to avoid outputting semantically redundant relationships. To overcome these...
In recent years, various approaches have been investigated towards blind image quality assessment (IQA) with high accuracy and low complexity. In this paper we develop a pre-saliency map based blind IQA method, which takes advantage of saliency information in prior of quality prediction for performance enhancement by two steps. 1) We split the image into patches and design a convolution neural network...
Artificial Neural Networks are a widely used computing system implemented for a wide variety of tasks and problems. A common application of such networks is classification problems. However, a significant amount of this research focuses on one and two-dimensional information, such as vectorized data and images. There is limited research performed on three-dimensional media such as video clips. This...
Recently, deep learning has enjoyed a great deal of success for computer vision problems due to its capability to model highly complex tasks, such as image classification, object detection, face recognition, among many others. Although these neural networks are nowadays very powerful, there is a huge amount of parameters (i.e. the model) that need to be learned and require considerable storage space...
Thermal plasma spraying is an important manufacturing technique that creates a thermal barrier coating to protect the surface underneath from wear, erosion, oxidation and corrosion. In this paper, we develop a new microstructure classification and quantification (MCQ) module that could fully automatically classify and quantify two types of microstructures, globular and interlamellar, in the top coat...
The attention function has been classified into (i) sustained attention, (ii) selective attention, (iii) controlled attention, (iv) distributed attention, and (v) capacity for attention. Ordinarily, in order to evaluate the function of attention, the digital cancellation test (D-CAT) or trail making test (TMT) are employed. However, these evaluations are in the form of paper tests, and cannot effectively...
A projector is usually coupled with a dedicated projection surface to properly display visual information. This prevents the application of projection in places where a dedicated projection surface is not readily available. This paper presents a method for automatically detecting a good surface in a daily living and working space to support improvisatory projection without a pre-installed projection...
Text is the easiest means to record information but need not always be the best means for understanding a concept. In psychological theories, it is argued that when information is presented visually, it provides a better means to understand a concept. While techniques exist for generating text from a given image, the inverse problem that is to automatically fetch coherent images to represent a given...
Users' Quality of Experience (QoE) in Interactive 3D Tele-Immersion (i3DTI) systems is influenced by several factors such as the quality of the "live" 3D avatars of the users, network latency, rendering methodology (head mounted display or regular TV type of display), etc. Hence, it becomes important to answer the question: "Is Visual Quality (VQ) the only factor to be considered or...
Current live eLearning systems enable remote students to view the teaching environment comprising of several information sources such as the teacher and the teaching aids. These information sources are presented as individual video and audio elements. As a result, spatial connections between these elements, such as the teacher using hand gestures to point to an area on the screen, become meaningless...
In this work, we propose to derive the attribute specific similarity score for a pair of images using an existing parent deep model. As an example, given two facial images, we derive a similarity score for attributes like gender and complexion using an existing face recognition model. It is not always feasible to train a new model for each attribute, as training of deep neural network based model...
Current motion-capture technologies produce continuous streams of 3D human joint trajectories. One of the challenges is to automatically annotate such streams of complex spatio-temporal data in real time. In this paper, we propose an efficient approach to label motion stream data in real time with a limited usage of main memory. Based on a set of user-defined motion profiles, each of them specified...
Human action recognition of depth sensors has drawn wide attentions in computer vision and multimedia processing areas. In contrast to simple periodic actions, irrelevant actions or sharing sub-actions between different classes of two-person non-periodic interactions make this task challenging. This paper presents heterogeneous features fusion with Collaborative Representation (CR) to address this...
In this paper, we deal with the most challenging task of recovering the 3D human pose from just a single monocular image, that may be a synthetic image or a real internet image. The retrieval and reconstruction of the articulated 3D pose, both are prerequisites for the analysis of the people in images/videos. We address both tasks together and propose an efficient framework for search & retrieval...
We introduce Kara1k, a new musical dataset composed of 2,000 analyzed songs thanks to a partnership with a karaoke company. The dataset is divided into 1,000 cover songs provided by Recisio Karafun application1, and the corresponding 1,000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, it offers novel approaches,...
Films seek to elicit emotions in viewers by infusing the story they tell with an affective character or tone - in a word, a mood. In content-based multimedia analysis, considerable effort has been made to develop methods to estimate film affect computationally. However, results have been hampered by a tendency to classify film scenes either by genre or not at all, while other potentially helpful classification...
Since news videos are valuable sources of multimedia information on real-world events, there is a demand for viewing them efficiently. However, there is a problem that summarization methods based on auditory contents do not take into account the visual contents. In the case of news videos, due to its presentation style where audio contents and visual contents do not necessarily come from the same...
To deal with the rigid template matching problem in real-world scenarios, we propose a novel iterative feature-pair updating framework which is also robust to high levels of outliers, such as background changing, complex nonrigid deformation and partial occlusion. Given a pair of template image and target image, we first extract a set of corresponding feature-pairs as candidates. Then, we propose...
Incorporating user characteristics and contextual information has shown to be essential when it comes to personalized music retrieval and recommendation. To this end, the current location of a user is often exploited. However, relying solely on GPS coordinates neglects the cultural background of users, which does not necessarily coincide with political borders. In this paper, we analyze culture-specific...
The domain of minimally invasive surgery has recently attracted attention from the Multimedia community due to the fact that systematic video documentation is on the rise in this medical field. The vastly growing volumes of video archives demand for effective and efficient techniques to retrieve specific information from large video collections with visually very homogeneous content. One specific...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.