The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Image classification is a method that distinguishes the different categories of targets based on the different features of image. The current problem usually is that the feature modeling of target has a great influence on recognition robustness. In order to solve this problem, a correlation-based method is presented to optimize the bag-of-visual-word (BOVW) model by reducing the dictionary size. The...
Research into computational jigsaw puzzle solving, an emerging theoretical problem with numerous applications, has focused in recent years on puzzles that constitute square pieces only. In this paper we wish to extend the scientific scope of appearance-based puzzle solving and consider ’’brick wall” jigsaw puzzles – rectangular pieces who may have different sizes, and could be placed next to each...
Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues.,,We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between...
Fully convolutional network (FCN) has been successfully applied in semantic segmentation of scenes represented with RGB images. Images augmented with depth channel provide more understanding of the geometric information of the scene in the image. The question is how to best exploit this additional information to improve the segmentation performance.,,In this paper, we present a neural network with...
In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine visual...
We study large-scale multi-label classification (MLC) on two recently released datasets: Youtube-8M and Open Images that contain millions of data instances and thousands of classes. The unprecedented problem scale poses great challenges for MLC. First, finding out the correct label subset out of exponentially many choices incurs substantial ambiguity and uncertainty. Second, the large data-size and...
Aesthetic quality assessment plays an important role in how people organize large image collections. Many studies on aesthetic quality assessment are based on design of hand-crafted features without considering whether attributes conveyed by images can actually affect image aesthetics. This paper presents an aesthetic quality assessment method which uses new visual features. The proposed method utilizes...
Being intensively studied, visual tracking has seen great recent advances in either speed (e.g., with correlation filters) or accuracy (e.g., with deep features). Real-time and high accuracy tracking algorithms, however, remain scarce. In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity...
The inherent dependencies between visual elements and aural elements are crucial for affective video content analyses, yet have not been successfully exploited. Therefore, we propose a multimodal deep regression Bayesian network (MMDRBN) to capture the dependencies between visual elements and aural elements for affective video content analyses. The regression Bayesian network (RBN) is a directed graphical...
A major challenge in matching between vision and language is that they typically have completely different features and representations. In this work, we introduce a novel bridge between the modality-specific representations by creating a co-embedding space based on a recurrent residual fusion (RRF) block. Specifically, RRF adapts the recurrent mechanism to residual learning, so that it can recursively...
Two types of information exist in a stereo pair: correlation (matching) and decorrelation (half-occlusion). Vision science has shown that both types of information are used in the visual cortex, and that people can perceive depth even when correlation cues are absent or very weak, a capability that remains absent from most computational stereo systems. As a step toward stereo algorithms that are more...
How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered background, while maintaining real-time response, is an essential problem of visual object tracking. Recently, Siamese networks have shown great potentials of matching based trackers in achieving balanced accuracy and beyond realtime speed. However, they still have a big gap to classification...
Experiential attributes are a possible way of explaining user's experiences during interaction. Recently presented set of 23 aesthetic categories of interaction was established with a purpose to explain users' aesthetic experiences. This recent work focused on touch devices, such as smartphones and tablets, and concluded with the need to study further the goodness of established categories. The study,...
Computer programs used for three-dimensional visualizations of phenomena occurring within the electromagnetic field do not have built-in diagnostic tools allowing to verify the quality of the obtained view. The image presented during the simulation may contain a variety of artifacts. The anomaly most often identified in three-dimensional images is vertical disparity. Detection of the above mentioned...
Access rates are a key indicator that reflects the popularity of web pages. High access rates are extremely important for web pages, especially for news web pages, online shopping sites, and search engines. We analyzed the influences of visual fluency and cognitive fluency on the access rates of Chinese web pages. First, we conducted an experiment of scoring the web pages. Twenty-five subjects were...
Visual object tracking is one of the basic units in the construction of smart cities, which focuses on establishing a dynamic appearance model to represent and recognize the target in complex scenarios. In this paper, we consider visual object tracking as multiple local patches matching problem and design an online tracker based on correlation filter and binary descriptors. We integrate binary descriptors...
In this paper we describe a cyberspace of scientific papers, in which the most cited and significant documents are represented by a large size and the distance between documents is proportional to their semantic similarity. A new measure of semantic similarity of documents is proposed that is determined by the maximum correlation between explicit and implicit connectivity of the documents. A new science...
Clustering techniques have gained great popularity in neuroscience data analysis especially in analysing data from complex experiment paradigm where it is hard to apply traditional model-based method. However, when employing clustering analysis, many clustering algorithms are available nowadays and even with an individual clustering algorithm, choices like parameter settings and distance metrics are...
This work applies the Gaussian Mixture Probability Hypothesis Density (GMPHD) Filter to multi-object tracking in video data. In order to take advantage of additional visual information, Kernelized Correlation Filters (KCF) are evaluated as a possible extension of the GMPHD tracking-by-detection scheme to enhance its performance. The baseline GMPHD filter and its extension are evaluated on the UA-DETRAC...
Exploiting correlations in the audio, several works in the past have demonstrated the ability to automatically match and synchronize User Generated Recordings (UGRs) of the same event. Considering a small number of synchronized UGRs, we formulate in this paper simple linear audio mixing approaches to combine the available audio content. We use data from two different public events to perform a comparative...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.