The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dropout is a very effective way of regularizing neural networks. Stochastically “dropping out” units with a certain probability discourages over-specific co-adaptations of feature detectors, preventing overfitting and improving network generalization. Besides, Dropout can be interpreted as an approximate model aggregation technique, where an exponential number of smaller networks are averaged in order...
Training deep neural networks is difficult for the pathological curvature problem. Re-parameterization is an effective way to relieve the problem by learning the curvature approximately or constraining the solutions of weights with good properties for optimization. This paper proposes to reparameterize the input weight of each neuron in deep neural networks by normalizing it with zero-mean and unit-norm,...
Dominant approaches to action detection can only provide sub-optimal solutions to the problem, as they rely on seeking frame-level detections, to later compose them into ‘action tubes’ in a post-processing step. With this paper we radically depart from current practice, and take a first step towards the design and implementation of a deep network architecture able to classify and regress whole video...
This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. We propose a coarse to fine architecture that addresses shortcomings of the baseline architecture in [26] that stem from the fact that large inaccuracies of its coarse ConvNet cannot be corrected by the refinement ConvNet...
General human action recognition requires understanding of various visual cues. In this paper, we propose a network architecture that computes and integrates the most important visual cues for action recognition: pose, motion, and the raw images. For the integration, we introduce a Markov chain model which adds cues successively. The resulting approach is efficient and applicable to action classification...
In this work we propose a novel framework named Dual-Net aiming at learning more accurate representation for image recognition. Here two parallel neural networks are coordinated to learn complementary features and thus a wider network is constructed. Specifically, we logically divide an end-to-end deep convolutional neural network into two functional parts, i.e., feature extractor and image classifier...
Deep neural networks enjoy high interest and have become the state-of-art methods in many fields of machine learning recently. Still, there is no easy way for a choice of network architecture. However, the choice of architecture can significantly influence the network performance. This work is the first step towards an automatic architecture design. We propose a genetic algorithm for an optimization...
This work targets people identification in video based on the way they walk (i.e. gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the...
The development of a deep (stacked) convolutional auto-encoder in the Caffe deep learning framework is presented in this paper. We describe simple principles which we used to create this model in Caffe. The proposed model of convolutional auto-encoder does not have pooling/unpooling layers yet. The results of our experimental research show comparable accuracy of dimensionality reduction in comparison...
With the advent of low-cost RGBD sensors, many solutions have been proposed for extraction and fusion of colour and depth information. In this paper, we propose new different fusion approaches of these multimodal sources for people detection. We are especially concerned by a scenario where a robot evolves in a changing environment. We extend the use of the Faster RCNN framework proposed by Girshick...
New trends in neural computation, now dealing with distributed learning on pervasive sensor networks and multiple sources of big data, make necessary the use of computationally efficient techniques to be implemented on simple and cheap hardware architectures. In this paper, a nonuniform quantization at the input layer of neural networks is introduced, in order to optimize their implementation on hardware...
In these last few years, several recent studies have demonstrated the possibility to perform Human Activity Recognition (HAR) by smartphone sensor data, enabling in this way a new generation of context-aware mobile applications. Smartphone-based HAR systems can exploit the full set of embedded sensors beside the accelerometer in order to increase the accuracy of the detection process. At the same...
Nowadays face recognition plays a central role in surveillance, biometrics and security. In this paper a Field-Programmable Gate Array (FPGA) based low-cost real-time architecture for face recognition is presented. The face recognition module receives the detected faces from a video stream and processes the data with the widely used Eigenfaces, also known as the Principal Component Analysis (PCA)...
We propose a neural network architecture for depth map inference from monocular stabilized videos with application to UAV videos in rigid scenes. Training is based on a novel synthetic dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Based on this network, we propose a multi-range architecture for unconstrained UAV flight, leveraging flight...
This work presents an embedded hardware architecture for real-time ultrasonic NDE applications that incorporate Hidden Markov Model (HMM) based statistical signal methods. HMM has been successfully used in applications like audio segment retrieval, speech/language recognition and image processing applications. Recently, we proposed a new Hidden Markov Model (HMM) based ultrasonic flaw detection algorithm...
Study of critical transitions and early warning measures are of great importance for dealing with any complex system. Manually selected statistical features with handpicked parameters have been used in a wide variety of fields for this purpose. We envision the use of deep learning architectures like simple feed forward networks (FFN), convolutional neural networks (CNN) and long short-term memory...
Extreme Learning Machine (ELM) is a neural network architecture with Single Layer Feed-forward Neural Network (SLFN). For meaningful results, the structure of ELM has to be optimized through the inclusion of regularization and the ℓ2 — norm based regularization is mostly used. ℓ2-norm based regularization achieves better performance than the traditional ELM. The estimate of the regularization parameter...
Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding,...
The paper proposes the ScatterNet Hybrid Deep Learning (SHDL) network that extracts invariant and discriminative image representations for object recognition. SHDL framework is constructed with a multi-layer ScatterNet front-end, an unsupervised learning middle, and a supervised learning back-end module. Each layer of the SHDL network is automatically designed as an explicit optimization problem leading...
Area V5 or Middle Temporal (MT) area of the primate brain is said to be involved in visual motion perception. Physiological studies indicate that the neurons in MT respond selectively to the direction of moving stimuli. However in response to the complex stimuli containing multiple oriented components, a set of MT neurons are selective to the direction of the component motion whereas the other set...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.