The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With systems performing Simultaneous Localization And Mapping (SLAM) from a single robot reaching considerable maturity, the possibility of employing a team of robots to collaboratively perform a task has been attracting increasing interest. Promising great impact in a plethora of tasks ranging from industrial inspection to digitization of archaeological structures, collaborative scene perception...
Visual Question Answering is a complex problem that fuses natural language and image processing to answer a question based on information from the image. The basic architecture for accomplishing this is using a CNN to extract features from the image and an RNN for the language processing, then combine the two in an MLP to produce an answer. These architectures perform well at identifying content,...
Image aesthetics assessment has been challenging due to its subjective nature. Inspired by the Chatterjee's visual neuroscience model, we design Deep Chatterjee's Machine (DCM) tailored for this task. DCM first learns attributes through the parallel supervised pathways, on a variety of selected feature dimensions. A high-level synthesis network is trained to associate and transform those attributes...
Although the introduction of deep learning has led to significant performance improvements in many machine learning applications, several recent studies have revealed that deep feedforward models are easily fooled. Fooling in effect results from overgeneralization of neural networks over regions far from the training data. To circumvent this problem this paper proposes a novel elaboration of standard...
In this article, we propose a new optimized embedded architecture based soft-core processors oriented to visual attention based object recognition applications. Our recognition approach relies mainly on two specific modules for online processing of acquired images in real-time: a novel saliency based feature detector/descriptor module and then an object classifier module. To deal with such parallel/pipeline...
Accurate prediction of vehicle ego-motion in real time is crucial for an autonomous driving system. In this paper, we formulate the problem of ego-motion classification as video event detection, and we propose an end-to-end deep model to address this problem. In this model, we utilize Convolutional Neural Networks (CNNs) to extract semantic visual feature of each video frame, and employ a Long Short...
Programming tasks on personal service robots in multi-disciplinary teams is challenging. The goal of this research is to enable roboticists and non-programmer domain experts to co-develop robot service scenarios in real world environments using a visual programming environment called RoboStudio. The first key contribution of this paper is presenting the implementation architecture of RoboStudio. This...
This paper documents a pilot study evaluating a simple approach allowing users to eat real food while exploring a virtual environment (VE) through a head-mounted display (HMD). Two cameras mounted on the HMD allowed for video-based stereoscopic see-through when the user’s head orientation pointed toward the food, and the VE would appear when the user turned elsewhere. The pilot study revealed that...
We present an approach to model the deployment costs, including compute and IO costs, of Microservice-based applications deployed to a public cloud. Our model, which we dubbed CostHat, supports both, Microservices deployed on traditional IaaS or PaaS clouds, and services that make use of novel cloud programming paradigms, such as AWS Lambda. CostHat is based on a network model, and allows for what-if...
One of the fundamental functionalities for autonomous navigation of Unmanned Aerial Vehicles (UAVs) is the hovering capability. State-of-the-art techniques for implementing hovering on standard-size UAVs process camera stream to determine position and orientation (visual odometry). Similar techniques are considered unaffordable in the context of nano-scale UAVs (i.e. few centimeters of diameter),...
We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) — an improved attention-based architecture for multiple object recognition. The proposed model is a fully differentiable unit that can be optimized end-to-end by using Stochastic Gradient Descent (SGD). The Spatial Transformer (ST) was employed as visual attention mechanism which allows to learn the geometric transformation of objects...
Visual odometry is a challenging task related to simultaneous localization and mapping that aims to generate a map traveled from a visual data stream. Based on one or two cameras, motion is estimated from features and pixel differences between frames. Because of the frame rate of the cameras, there are generally small, incremental changes between subsequent frames where optical flow can be assumed...
In this paper is presented the integration of diverse modules for people fallen detection by a mobile service robot. This integration has been achieved in the middleware ROS (Robotics Operation System). The proposed implementation are arranged over an modular architecture of three layers: Hardware, Processing and Decision. The modules implemented are on the processing layer. The first module uses...
An architecture for hybrid language systems is presented. A hybrid language has features of both textual languages and visual languages. Textual languages are computer-oriented and are geared toward storage, syntax analysis, and editing. On the other hand, visual languages are human-oriented and are geared toward expressive power, understandability, direct manipulation, and learning cost. Although...
Research on Offline Handwritten Signature Verification explored a large variety of handcrafted feature extractors, ranging from graphology, texture descriptors to interest points. In spite of advancements in the last decades, performance of such systems is still far from optimal when we test the systems against skilled forgeries - signature forgeries that target a particular individual. In previous...
Most research in image classification has focused on applications such as face, object, scene and character recognition. This paper examines a comparative study between deep convolutional neural networks (CNNs) and bag of visual words (BOW) variants for recognizing animals. We developed two variants of the bag of visual words (BOW and HOG-BOW) and examine the use of gray and color information as well...
Currently, the supervised trained deep neural networks (DNNs) have been successfully applied in several image classification tasks. However, how to extract powerful data representations and discover semantic concepts from unlabeled data is a more practical issue. Unsupervised feature learning methods aim at extracting abstract representations from unlabeled data. Large amount of research works illustrate...
Analyzing and visualizing large datasets generated by real-time spatio-temporal activities (e.g. vehicle mobility or large crowd movement) are a very challenging task. Recursive delays both at middleware and front end applications limit the of usefulness of the real-time analysis. In this paper, we present a framework “Spatial-Crowd” that first handles spatial-temporal data acquisition and processing...
These days, some robots have emotional state (expression and recognition) to make Human-Robot Interaction (HRI) and Robot-Robot Interaction (RRI) better. In this article we analyze what it means for a robot to have emotion and distinguishing emotional state for communication from an emotional state as a mechanism for the organization of its behavior with humans and robots by convolutional neural network...
Computing systems increasingly comprise large numbers of heterogeneous subsystems, each with their own local perspective and goals, connected in dynamic networks, and interacting with each other and humans in ways which are difficult to predict. Nevertheless, users engaging with different parts of the system still expect high performance, reliability, security and other qualities, provided in a way...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.