The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The deep convolutional neural network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following some basic principles such as increasing network depth and constructing highway connections, researchers have manually designed a lot of fixed network architectures and verified their effectiveness.,,In this paper, we discuss the possibility of learning deep network structures...
For large-scale visual search, highly compressed yet meaningful representations of images are essential. Structured vector quantizers based on product quantization and its variants are usually employed to achieve such compression while minimizing the loss of accuracy. Yet, unlike binary hashing schemes, these unsupervised methods have not yet benefited from the supervision, end-to-end learning and...
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions...
Modeling the activity of an ensemble of neurons can provide critical insights into the workings of the brain. In this work we examine if learning based signal modeling can contribute to a high quality modeling of neuronal signal data. To that end, we employ the sparse coding and dictionary learning schemes for capturing the behavior of neuronal responses into a small number of representative prototypical...
Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic...
Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative...
The latest High Efficiency Video Coding (HEVC) has been increasingly used to generate video streams over Internet. However, the decoded HEVC video streams may incur severe quality degradation, especially at low bit-rates. Thus, it is necessary to enhance visual quality of HEVC videos at the decoder side. To this end, we propose in this paper a Decoder-side Scalable Convolutional Neural Network (DS-CNN)...
We present VidedWhisfer, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics,...
This paper proposes a method based on the bag-of-words (BoW) and the softmax regression for microscopic image classification. Essentially, the locality-constrained linear coding (LLC) is adopted for local feature encoding. Compared with the traditionally adopted vector quantization (VQ) in the BoW framework, the LLC encodes local structures of microscopic images with lower quantization errors and...
Multi-target stimulus coding plays an important role in a steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI). In conventional SSVEP-based BCIs, a large interval between two neighboring stimulus frequencies is often used to improve classification accuracy. Although recent progresses in stimulus coding and target identification methods that have significantly improved...
We present an approach for unsupervised computation of local shape descriptors, which relies on the use of linear autoencoders for characterizing local regions of complex shapes. The proposed approach responds to the need for a robust scheme to index binary images using local descriptors, which arises when only few examples of the complete images are available for training, thus making inaccurate...
Bag of visual words (BoVW) remains a very competitive representation in the domain of scene classification. In this framework, extracting SIFT descriptors on a dense grid of pixels has shown to lead to a better performance. However, due to the nature of SIFT as an edge-based descriptor, computing SIFT on homogeneous regions might result in non-stable region descriptors. The suggested solution in the...
Sketches and other forms of graphical communication are central to both the practice and learning of engineering. Visual representations play a critical role in helping students learn engineering concepts, socialize them into the engineering discipline, and facilitate or hinder the design process. Despite the importance of graphical communication and visual representations, our understanding of how...
Sparse Coding is a widely used method to represent an image. However, sparse coding and its improved algorithms have the problem of complex computation and long running time and so on. For these problems, we propose an image classification method based on hash codes and space pyramid, which encodes local feature points with hash codes instead of sparse coding. Firstly, extract the local feature points...
A novel proposed approach, collaborative representation-based classification, has been developed for face recognition and recently used in image classification task owing to its simplicity and effectiveness. The major drawback of this method is the neglect of the spatial structure among the image representations. Inspired by the success of this technique and motivated by the power of spatial information...
Often, videos are composed of multiple concepts or even genres. For instance, news videos may contain sports, action, nature, etc. Therefore, encoding the distribution of such concepts/genres in a compact and effective representation is a challenging task. In this sense, we propose the Bag of Genres representation, which is based on a visual dictionary defined by a genre classifier. Each visual word...
Visual question answering (VQA) comes as a result of great development in computer vision and natural language processing, which requires deep understanding of images and questions and effective integration of them. Current works on VQA simply concatenated visual and textual features or compared them via dot product, which were unable to eliminate the semantic difference between them. We argue to...
Intra-frame prediction in the High Efficiency Video Coding (HEVC) standard can be empirically improved by applying sets of recursive two-dimensional filters to the predicted values. However, this approach does not allow (or complicates significantly) the parallel computation of pixel predictions. In this work we analyze why the recursive filters are effective, and use the results to derive sets of...
Brain-computer interfacing (BCI) based on steady-state visual evoked potentials (SSVEPs) is one of the most practical BCIs because of its high recognition accuracies and little training of a user. Mixed frequency and phase coding which can implement a number of commands and achieve a high information transfer rate (ITR) has recently been gaining much attention. In order to implement mixed-coded SSVEP-BCI...
In the fine-grained categories, images have lager diversity in their intra categories. Meanwhile, they have more similarity in their inter categories. Therefore, images are difficultly distinguish during fine-grained visual classification(FGVC). This paper proposes a deep sparse coding framework to implement the fine-grained visual categorization. In our framework, deep layer structures with sparse...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.