The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Fine-grained vehicle classiflcation is a challenging task due to the subtle differences between vehicle classes. Several successful approaches to fine-grained image classification rely on part-based models, where the image is classified according to discriminative object parts. Such approaches require however that parts in the training images be manually annotated, a laborintensive process. We propose...
In this paper, we develop an end-to-end framework for text region proposal generation and text detection in the natural scene, which takes advantage of the efficient training strategies of the neural network. We make some technical changes over the SSD architectures, so that our method can handle the Chinese and English bilingual situation. Experiments show that our method exceeds the state-of-the-art...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Information and communication technologies (ICTs) are changing the way people develop their lives. This change is not indifferent to activities involved in education. For this reason, several learning techniques, based on ICTs, appear to improve the efficiency of the educational process. However, the problem of a maintained individualized learning arises, which does not focus on needs of the student,...
With the advent of new technologies in the field of medicine, there is rising awareness of biomechanisms, and we are better able to treat ailments than we could earlier. Deep learning has helped a lot in this endeavor. This paper deals with the application of deep learning in brain tumor segmentation. Brain tumors are difficult to segment automatically given the high variability in the shapes and...
Crowd counting on still images is very challenging due to heavy occlusions and scale variations. In this paper, we aim to develop a method that can accurately estimate the crowd count from a still image. Recently, convolutional neural networks have been shown effective in many computer vision tasks including crowd counting. To this end, we propose a fully convolutional network (FCN) architecture to...
The deployment of deep convolutional neural networks (CNNs) in many real world applications is largely hindered by their high computational cost. In this paper, we propose a novel learning scheme for CNNs to simultaneously 1) reduce the model size; 2) decrease the run-time memory footprint; and 3) lower the number of computing operations, without compromising accuracy. This is achieved by enforcing...
Future frame prediction in videos is a promising avenue for unsupervised video representation learning. Video frames are naturally generated by the inherent pixel flows from preceding frames based on the appearance and motion dynamics in the video. However, existing methods focus on directly hallucinating pixel values, resulting in blurry predictions. In this paper, we develop a dual motion Generative...
Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection in the wild. Classical CNN-based face detection methods simply stack successive layers of filters where an input sample should pass through all layers before reaching a face/non-face decision. Inspired by the fact that for face detection, filters in deeper layers can discriminate between difficult face/non-face...
Human actions captured in video sequences are threedimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short- Term Memory (LSTM),...
As handheld video cameras are now commonplace and available in every smartphone, images and videos can be recorded almost everywhere at anytime. However, taking a quick shot frequently yields a blurry result due to unwanted camera shake during recording or moving objects in the scene. Removing these artifacts from the blurry recordings is a highly ill-posed problem as neither the sharp image nor the...
Demand for highly energy-efficient coprocessor for the inference computation of deep neural networks is increasing. We propose the time-domain neural network (TDNN), which employs time-domain analog and digital mixed-signal processing (TDAMS) that uses delay time as the analog signal. TDNN not only exploits energy-efficient analog computing, but also enables fully spatially unrolled architecture by...
Dropout is a very effective way of regularizing neural networks. Stochastically “dropping out” units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in order...
Training deep neural networks is difficult for the pathological curvature problem. Re-parameterization is an effective way to relieve the problem by learning the curvature approximately or constraining the solutions of weights with good properties for optimization. This paper proposes to reparameterize the input weight of each neuron in deep neural networks by normalizing it with zero-mean and unit-norm,...
Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into ‘action tubes’ in a post-processing step. With this paper we radically depart from current practice, and take a first step towards the design and implementation of a deep network architecture able to classify and regress whole video...
This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. We propose a coarse to fine architecture that addresses shortcomings of the baseline architecture in [26] that stem from the fact that large inaccuracies of its coarse ConvNet cannot be corrected by the refinement ConvNet...
General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification...
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier...
Deep neural networks enjoy high interest and have become the state-of-art methods in many fields of machine learning recently. Still, there is no easy way for a choice of network architecture. However, the choice of architecture can significantly influence the network performance. This work is the first step towards an automatic architecture design. We propose a genetic algorithm for an optimization...
This work targets people identification in video based on the way they walk (i.e. gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.