The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we focus on describing the method we designed for automatic perceived personality prediction. We present a simple model that uses three different sets of features: nonverbal audio cues, visual cues from video, and facial landmark points. The model uses a random decision forest to do regression from the extracted features. As we discuss in Section 4, this multimodal model performs relatively...
This paper deals with identifying a writer from his/her offline handwriting. In a multilingual country where a writer can scribe in multiple scripts, writer identification becomes challenging when we have individual handwriting data in one script while we need to verify/identify a writer from handwriting in another script. In this paper such an issue is addressed with two scripts: English and Bengali...
We consider the problem of joint modeling of videos and their corresponding textual descriptions (e.g. sentences or phrases). Our approach consists of three components: the video representation, the textual representation, and a joint model that links videos and text. Our video representation uses the state-of-the-art deep 3D ConvNet to capture the semantic information in the video. Our textual representation...
We present a method for learning discriminative filters using a shallow Convolutional Neural Network (CNN). We encode rotation invariance directly in the model by tying the weights of groups of filters to several rotated versions of the canonical filter in the group. These filters can be used to extract rotation invariant features well-suited for image classification. We test this learning procedure...
The encoding method is an important factor for an action recognition pipeline. One of the key points for the encoding method is the assignment step. A very widely used super-vector encoding method is the vector of locally aggregated descriptors (VLAD), with very competitive results in many tasks. However, it considers only hard assignment and the criteria for the assignment is performed only from...
The ability to automatically detect the extent of agreement or disagreement a person expresses is an important indicator of inter-personal relations and emotion expression. Most of existing methods for automated analysis of human agreement from audio-visual data perform agreement detection using either audio or visual modality of human interactions. However, this is suboptimal as expression of different...
Micro-expression recognition is a challenging task in computer vision field due to the repressed facial appearance and short duration. Previous work for micro-expression recognition have used hand-crafted features like LBP-TOP, Gabor filter and optical flow. This paper is the first work to explore the possible use of deep learning for micro-expression recognition task. Due to the lack of data for...
We present an algorithm for learning a feature representation for video segmentation. Standard video segmentation algorithms utilize similarity measurements in order to group related pixels. The contribution of our paper is an unsupervised method for learning the feature representation used for this similarity. The feature representation is defined over video supervoxels. An embedding framework learns...
Given a query image, retrieving images depicting the same object in a large scale database is becoming an urgent and challenging task. Recently, Compact Description for Visual Search (CDVS) is drafted by the ISO/IEC Moving Pictures Experts Group (MPEG) to support image retrieval applications, and it has been published as an international standard. Unfortunately, with regard to applications with hugely...
Image quality assessment gains a greater interest due to development of digital imaging and storage. In that field, structural similarity (SSIM) index has been shown to favorably agree with human perceptual assessment, significantly outperforming the method of mean squared error, i.e., L2 distance. The similarity measure function in SSIM which compares a target (distorted) image with its reference...
In this paper, we focus on the text/non-text classification problem: distinguishing images that contain text from a lot of natural images. To this end, we propose a novel neural network architecture, termed Convolutional Multi-Dimensional Recurrent Neural Network (CMDRNN), which distinguishes text/non-text images by classifying local image blocks, taking both region pixels and dependencies among blocks...
Blur is a common artifact in video, which adds more complexity to text detection and recognition. To achieve good accuracies for text detection and recognition, this paper suggests a new method for classifying blurred and non-blurred frames in video. We explore quality metrics, namely, BRISQUE, NRIQA, GPC and SI, in a new way for classification. We estimate the values of these metrics with the help...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.