The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Developing a technique for the automatic analysis of surveillance videos in order to identify the presence of violence is of broad interest. In this work, we propose a deep neural network for the purpose of recognizing violent videos. A convolutional neural network is used to extract frame level features from a video. The frame level features are then aggregated using a variant of the long short term...
We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores. Additionally, our model classifies the start, middle, and end of each action as separate components, allowing our system to explicitly model each actions temporal...
In recent years, both online retail and video hosting service have been exponentially grown. In this paper, a novel deep neural network, called AsymNet, is proposed to explore a new cross-domain task, Video2Shop, targeting for matching clothes appeared in videos to the exactly same items in online shops. For the image side, well-established methods are used to detect and extract features for clothing...
Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset. Different from object detection in static images, temporal information in videos is vital for object detection. To fully utilize temporal information, state-of-the-art methods [15, 14] are based on spatiotemporal tubelets, which are essentially sequences of associated bounding...
Recent progress in sports analytics has been driven by the availability of spatio-temporal and high level data. Video-based action recognition in sports can significantly contribute to these advances. Good progress has been made in the field of action recognition but its application to sports mainly focuses in detecting which sport is being played. In order for action recognition to be useful in sports...
In this work we present algorithms which are applied in such task as text recognition on images and video. Proposed algorithm is based on the combination of discrete cosine transform and convolutional neural networks. Description of the applying features of discrete cosine transform for text detection is provided. We list the main advantages and disadvantages of CNN and DCT combination. Also in this...
Describing the contents of images is a challenging task for machines to achieve. It requires not only accurate recognition of objects and humans, but also their attributes and relationships as well as scene information. It would be even more challenging to extend this process to identify falls and hazardous objects to aid elderly or users in need of care. This research makes initial attempts to deal...
This paper studies monocular visual odometry (VO) problem. Most of existing VO algorithms are developed under a standard pipeline including feature extraction, feature matching, motion estimation, local optimisation, etc. Although some of them have demonstrated superior performance, they usually need to be carefully designed and specifically fine-tuned to work well in different environments. Some...
In order to ensure the safety measures, the detection of traffic rule violators is a highly desirable but challenging task due to various difficulties such as occlusion, illumination, poor quality of surveillance video, varying whether conditions, etc. In this paper, we present a framework for automatic detection of motorcyclists driving without helmets in surveillance videos. In the proposed approach,...
Since pedestrians in videos have a wide range of appearances such as body poses, occlusions, and complex backgrounds, pedestrian detection is a challengeable task. In this paper, we propose part-level fully convolutional networks (FCN) for pedestrian detection. We adopt deep learning to deal with the proposal shifting problem in pedestrian detection. First, we combine convolutional neural networks...
Varying types of shots is a fundamental element in the language of film, commonly used by a visual storytelling director to convey the emotion, ideas, and art. To classify such types of shots from images, we present a new framework that facilitates the intriguing task by addressing two key issues. We first focus on learning more effective features by fusing the layer-wise outputs extracted from a...
YouTube draws large number of users who contribute actively by uploading videos or commenting on existing videos. However, being a crowd sourced and large content pushed onto it, there is limited control over the content. This makes malicious users push content (videos and comments) which is inappropriate (unsafe), particularly when such content is placed around cartoon videos which are typically...
In this work we explore Hidden Markov models as an approach for modeling and recognizing dynamic hand gestures for the interface of in-vehicle infotainment systems. We train the HMMs on more complex shape descriptors such as HOG and CNN features, unlike typical HMM based approaches. An analysis of the optimal hyperparameters of the HMM for the task has been carried out. Also, dimensionality reduction...
In recent years, a dramatically increasing number of surveillance cameras have been installed to monitor private and public spaces and areas. Video surveillance is seen as an effective way to ensure our security. Therefore, modeling activity patterns and human behaviors for detection or recognition of peculiar event is a critical technology which has attracted remarkable research interest in the last...
In this paper, we consider the animal object detection and segmentation from wildlife monitoring videos captured by motion-triggered cameras, called camera-traps. For these types of videos, existing approaches often suffer from low detection rates due to low contrast between the foreground animals and the cluttered background, as well as high false positive rates due to the dynamic background. To...
This paper is devoted to describe a preliminary draft of our approach that aims to identify and track learners' learning styles based on their behavior and actions during a MOOC then to provide them with personalized recommendations based on their learning styles. Massive Open Online Courses are attracting a debate in the research community about their influence in online education. Indeed, with their...
Matrix completion is the task of predicting unknown or missing entries in a data matrix. The estimation of the missing entries is based on the assumption that the underlying matrix is a low rank one. Deep learning has evolved as an efficient tool for feature extraction in many large-scale image based applications. Exploiting the techniques from both domains, we propose a novel solution to the problem...
Human action recognition is a challenging vision task due to the complex action patterns in the real-world videos. In this work, we propose a DeepAction Kernel Gaussian Process, which takes advantage of Gaussian process (GP) and deep learning, to capture the distinctive action characteristics. Specifically, we design a unified, deep and non-adjacent kernel structure within Gaussian process to classify...
This paper proposes a novel method using deep spatial-temporal neural networks based on deep Convolutional Neural Network (CNN) for multimedia event detection. To sufficiently take advantage of the motion and appearance information of events from videos, our networks contain two branches: a temporal neural network and a spatial neural network. The temporal neural network captures motion information...
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.