The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hairstyle recognition is a challenging task since hairstyles span a diverse range of appearances in real-world. However, it is possible to start from recognizing the most basic hairstyles then dealing with more complex hairstyles. In this paper, we present a novel hairstyle pattern recognition system based on CNNs. We first give the definitions of four basic hairstyles: straight hairstyle, curly hairstyle,...
Building a modern Optical Character Recognition (OCR) system for Chinese is hard due to the large Chinese vocabulary list. Training images for rare Chinese characters are extremely expensive to obtain. Radical-based OCR systems tackle this problem by first extracting and recognizing basic graphical components (i.e., radicals) of a Chinese character. However, how to reliably recognize radicals still...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Gender recognition from face images is a challenging problem with applications in various knowledge domains, such as biometrics, security and surveillance, human-computer interaction, among others. In this work, we propose and evaluate a novel method for gender recognition based on a geometric descriptor constructed from a pre-defined face shape model. The proposed approach, tested on four different...
Breast cancer (BC) is a deadly disease, killing millions of people every year. Developing automated malignant BC detection system applied on patient's imagery can help dealing with this problem more efficiently, making diagnosis more scalable and less prone to errors. Not less importantly, such kind of research can be extended to other types of cancer, making even more impact to help saving lives...
Deep learning has brought a series of breakthroughs in image processing. Specifically, there are significant improvements in the application of food image classification using deep learning techniques. However, very little work has been studied for the classification of food ingredients. Therefore, this paper proposes a new framework, called DeepFood which not only extracts rich and effective features...
An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multitask learning. In particular, the task of visual recognition is aligned to the task of visual question answering...
A method for scene text localization and recognition is proposed. The novelties include: training of both text detection and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the text and adapts its resolution to the data.,,The proposed method achieves state-of-the-art accuracy in the end-to-end text recognition...
This work proposes Recurrent Neural Network (RNN) models to predict structured ‘image situations’ – actions and noun entities fulfilling semantic roles related to the action. In contrast to prior work relying on Conditional Random Fields (CRFs), we use a specialized action prediction network followed by an RNN for noun prediction. Our system obtains state-of-the-art accuracy on the challenging recent...
Residual network(ResNet) is an effective instance and a significant extension of deep convolutional neural network. ResNet utilizes skip-connection between input layers and output layers to solve the vanishing gradient problem. Due to the powerfulness of skip-connection, the gradient can flow directly through the identity function from later layers to the earlier layers. However, skip-connection makes...
Human actions captured in video sequences are threedimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short- Term Memory (LSTM),...
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user...
While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this...
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELE (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for key point selection,...
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier...
Recognizing how objects interact with each other is a crucial task in visual recognition. If we define the context of the interaction to be the objects involved, then most current methods can be categorized as either: (i) training a single classifier on the combination of the interaction and its context; or (ii) aiming to recognize the interaction independently of its explicit context. Both methods...
This paper addresses the problem of online tracking and classification of multiple objects in an image sequence. Our proposed solution is to first track all objects in the scene without relying on object-specific prior knowledge, which in other systems can take the form of hand-crafted features or user-based track initialization. We then classify the tracked objects with a fast-learning image classifier,...
To successfully move a robot into the building, the elevator button and elevator floor number detection and recognition can play an important role. It can help a robot move in the building, just as it also can help a visually impaired person who wants to move another floor in the building. Due to vision-based approach, the difference in lighting condition and the complex background are the main obstacles...
This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a “layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers...
In this paper, we present a novel approach for real-time object identification on a mobile platform. First, our system detects keypoints within a scaled pyramid-based FAST detector and then descriptors of the object of interest are computed using an Analytical Fourier-Mellin transform. The Fourier-Mellin is used in similarity studies due to its invariance property and discrimination power. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.