The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Breast cancer (BC) is a deadly disease, killing millions of people every year. Developing automated malignant BC detection system applied on patient's imagery can help dealing with this problem more efficiently, making diagnosis more scalable and less prone to errors. Not less importantly, such kind of research can be extended to other types of cancer, making even more impact to help saving lives...
Deep learning has brought a series of breakthroughs in image processing. Specifically, there are significant improvements in the application of food image classification using deep learning techniques. However, very little work has been studied for the classification of food ingredients. Therefore, this paper proposes a new framework, called DeepFood which not only extracts rich and effective features...
An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multitask learning. In particular, the task of visual recognition is aligned to the task of visual question answering...
A method for scene text localization and recognition is proposed. The novelties include: training of both text detection and recognition in a single end-to-end pass, the structure of the recognition CNN and the geometry of its input layer that preserves the aspect of the text and adapts its resolution to the data.,,The proposed method achieves state-of-the-art accuracy in the end-to-end text recognition...
This work proposes Recurrent Neural Network (RNN) models to predict structured ‘image situations’ – actions and noun entities fulfilling semantic roles related to the action. In contrast to prior work relying on Conditional Random Fields (CRFs), we use a specialized action prediction network followed by an RNN for noun prediction. Our system obtains state-of-the-art accuracy on the challenging recent...
Residual network(ResNet) is an effective instance and a significant extension of deep convolutional neural network. ResNet utilizes skip-connection between input layers and output layers to solve the vanishing gradient problem. Due to the powerfulness of skip-connection, the gradient can flow directly through the identity function from later layers to the earlier layers. However, skip-connection makes...
Human actions captured in video sequences are threedimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short- Term Memory (LSTM),...
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts. The traditional approach of annotating thousands of images per class for training is infeasible in such a scenario, prompting the use of webly supervised data. This paper explores the training of image-recognition systems on large numbers of images and associated user...
While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this...
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELE (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for key point selection,...
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier...
Recognizing how objects interact with each other is a crucial task in visual recognition. If we define the context of the interaction to be the objects involved, then most current methods can be categorized as either: (i) training a single classifier on the combination of the interaction and its context; or (ii) aiming to recognize the interaction independently of its explicit context. Both methods...
To successfully move a robot into the building, the elevator button and elevator floor number detection and recognition can play an important role. It can help a robot move in the building, just as it also can help a visually impaired person who wants to move another floor in the building. Due to vision-based approach, the difference in lighting condition and the complex background are the main obstacles...
This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a “layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers...
In this paper, we present a novel approach for real-time object identification on a mobile platform. First, our system detects keypoints within a scaled pyramid-based FAST detector and then descriptors of the object of interest are computed using an Analytical Fourier-Mellin transform. The Fourier-Mellin is used in similarity studies due to its invariance property and discrimination power. In this...
Automatic License Plate Recognition (ALPR) has been employed in many developed countries for traffic management, automatic speed control, tracking stolen cars and also in automatic toll systems for improving the traffic control. ALPR is a surveillance system that extracts the information from the vehicle license plate by capturing the images. Human intervention to recognize the license plates results...
This work is focused on recognition of license plates in low resolution and low quality images. We present a methodology for collection of real world (non-synthetic) dataset of low quality license plate images with ground truth transcriptions. Our approach to the license plate recognition is based on a Convolutional Neural Network which holistically processes the whole image, avoiding segmentation...
The aim is to develop an efficient method which uses a custom image to train the classifier. This OCR extract distinct features from the input image for classifying its contents as characters specifically letters and digits. Input to the system is digital images containing the patterns to be classified. The analysis and recognition of the patterns in images are becoming more complex, yet easy with...
Recent work in the recognition of naturalistic expressions, which is also known as spontaneous facial expressions recognition, has attracted researchers' attention due to its importance in different behavioural and clinical applications. The main design challenges in the area of emotion computing for automatic recognition of spontaneous facial expression are the face pose, capture distance, illumination...
Surveillance cameras today often capture NIR (near infrared) images in low-light environments. However, most face datasets accessible for training and verification are only collected in the VIS (visible light) spectrum. It remains a challenging problem to match NIR to VIS face images due to the different light spectrum. Recently, breakthroughs have been made for VIS face recognition by applying deep...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.