The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this work, we propose to derive the attribute specific similarity score for a pair of images using an existing parent deep model. As an example, given two facial images, we derive a similarity score for attributes like gender and complexion using an existing face recognition model. It is not always feasible to train a new model for each attribute, as training of deep neural network based model...
In this paper, we deal with the most challenging task of recovering the 3D human pose from just a single monocular image, that may be a synthetic image or a real internet image. The retrieval and reconstruction of the articulated 3D pose, both are prerequisites for the analysis of the people in images/videos. We address both tasks together and propose an efficient framework for search & retrieval...
To deal with the rigid template matching problem in real-world scenarios, we propose a novel iterative feature-pair updating framework which is also robust to high levels of outliers, such as background changing, complex nonrigid deformation and partial occlusion. Given a pair of template image and target image, we first extract a set of corresponding feature-pairs as candidates. Then, we propose...
In nowadays, as the development of digital photographic technology, video files grow rapidly, there is a great demand for automatic video semantic analysis in many scenes, such as video semantic understanding, content-based analysis, video retrieval. Shot boundary detection is a key basic technology and first step for video analysis. However, recent methods are time consuming and performs bad in the...
Kotenseki is a collection of classical and ancient Japanese literature. It is comprised of image books that express Japanese stories by using comic drawings of different characters, such as humans, nature, and animals. To effectively store them for posterity, a search system is important. We propose an efficient CBIR system to assist the users in easily accessing the information and have an enjoyable...
Accurate Human Epithelial-2 (HEp-2) cell image classification plays an important role in the diagnosis of many autoimmune diseases. However, the traditional approach requires experienced experts to artificially identify cell patterns, which extremely increases the workload and suffer from the subjective opinion of physician. To address it, we propose a very deep residual network (ResNet) based framework...
In this paper, a new method of hand gesture recognition is proposed. First, the hand region is separated based on the depth information. Then the wavelet feature is calculated by enforcing the wavelet invariant moments of the hand region, and the distance feature is extracted by calculating the distance from fingers to hand centroid. Next, a feature vector which is composed of wavelet invariant moments...
With the rapid advances in digital technology, the multimedia documents have been growing ubiquitously. The analysis of this huge repository of multimedia documents requires efficient organization of documents. Multimedia document clustering organizes the multimedia documents with common multimedia topics. The important step of multimedia document clustering is computing the similarity between multimedia...
The task of object tracking in rectangular videos has been addressed in recent years by many researchers, where each method tries to propose a solution for a special challenge. Handling a variety of challenging situation of object tracking in 360-degree videos is still an unsolved problem and needs to be more considered. In the real world, the challenging situations include moving camera, high-resolution...
This paper proposed an adaptive sparse learning (ASL) framework to solve the multi-classification problem for neurodegenerative disease analysis. Specifically, we integrate the idea of feature selection and subspace learning to construct a least square regression model. The principle of Fisher's linear discriminant analysis (LDA) and locality preserving projection (LPP) are incorporated to utilize...
Incorporating user characteristics and contextual information has shown to be essential when it comes to personalized music retrieval and recommendation. To this end, the current location of a user is often exploited. However, relying solely on GPS coordinates neglects the cultural background of users, which does not necessarily coincide with political borders. In this paper, we analyze culture-specific...
Computer-aided analyses of motion capture data require an effective and efficient concept of motion similarity. Traditional methods generally compare motion sequences by applying time-warping techniques to high-dimensional trajectories of joints. An increasing effectiveness of machine-learning techniques, such as deep convolutional neural networks, brings new possibilities for similarity comparison...
Given the significant industrial growth of demand for virtual reality (VR), 360º video streaming is one of the most important VR applications that require cost-optimal solutions to achieve widespread proliferation of VR technology. Because of its inherent variability of data-intensive content types and its tiled-based encoding and streaming, 360º video requires new encoding ladders in adaptive streaming...
Current live eLearning systems enable remote students to view the teaching environment comprising of several information sources such as the teacher and the teaching aids. These information sources are presented as individual video and audio elements. As a result, spatial connections between these elements, such as the teacher using hand gestures to point to an area on the screen, become meaningless...
Current motion-capture technologies produce continuous streams of 3D human joint trajectories. One of the challenges is to automatically annotate such streams of complex spatio-temporal data in real time. In this paper, we propose an efficient approach to label motion stream data in real time with a limited usage of main memory. Based on a set of user-defined motion profiles, each of them specified...
We introduce Kara1k, a new musical dataset composed of 2,000 analyzed songs thanks to a partnership with a karaoke company. The dataset is divided into 1,000 cover songs provided by Recisio Karafun application1, and the corresponding 1,000 songs by the original artists. Kara1k is mainly dedicated toward cover song identification and singing voice analysis. For both tasks, it offers novel approaches,...
This paper presents a study that evaluates the performance of multi-view human activity recognition with videos having degraded quality. For the activity recognition models, a support vector machine-based approach using spatiotemporal features and a deep learning-based approach using convolutional and recurrent layers are built. We investigate the recognition performance of the two models with respect...
Abstract. In the context of smart cities and Internet of Things (IoT), there are many trending contents on the social networks that reflect the picture of the community or their interest. In this paper, we propose a model that automatically collect trending social data and analyze them automatically. The model explores trending contents, overall attitude of textual contents and the relationships among...
Video shot boundary detection is a fundamental step towards video information processing in e-learning scenarios. In the field of shot boundary detection, there still exists difficulty in choosing suitable thresholds for different videos, and empirical thresholds usually lead to low precision. Thus, we propose an original method to generate video-based threshold which is calculated by video itself...
Keypoint matching between images is an important technique for computer vision applications such as image retrieval. Although binary feature descriptors such as BRIEF enable fast measurement of distance, exhaustive search is still time-consuming. Hashing methods such as Locality Sensitive Hashing (LSH), while being effective to accelerate searching, result in large memory consumption and thus are...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.