The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Support vector machine (SVM) algorithm received much attention in the research of voiceprint recognition, especially for small sample datasets. However, with the increase of recognition number and speech features number, the rate of model training and recognition is significantly reduced. In order to solve the problem, a new weighted clustering algorithm is proposed, which use “one to one” SVM model...
Traffic behavioral monitoring within urban intersections is an essential issue in the Intelligent Transportation Systems (ITS) for a smart city. This paper investigates on gathering traffic information within an urban intersection where accidents frequently occur. In this paper, traffic pattern modeling, trajectory classification and a real-time vehicle tracker within the urban intersection are proposed...
With the advent of the incorporation of GPS receivers and then GPS-enabled smartphones in transportation data collection, many studies have looked at how to infer meaningful information from this data. Research in this field has concentrated on the use of heuristics and supervised machine learning methods to detect: trip ends, trip itineraries, travel mode and trip purpose. All the methods used until...
There are many challenges in single-channel multi-person mixed speech separation, such as modeling the temporal continuity of the speech signals and improving the frame separation performance simultaneously. In this paper, a separation method based on Deep Clustering with local optimization by the improved Non-Negative Matrix Factorization (NMF) combined with Factorial Conditional Random Fields (FCRF)...
In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...
A human action recognition method is introduced that detects a set of actions in videos by a temporal expansion with hidden Markov models of a pose detection with an artificial neural network. The method was set-up and tested using eleven actions from the MOCAP motion capture database comprising 3,947 frames. A poses alphabet of fourteen relevant poses was defined to be learned by an artificial neural...
This work presents an embedded hardware architecture for real-time ultrasonic NDE applications that incorporate Hidden Markov Model (HMM) based statistical signal methods. HMM has been successfully used in applications like audio segment retrieval, speech/language recognition and image processing applications. Recently, we proposed a new Hidden Markov Model (HMM) based ultrasonic flaw detection algorithm...
This work presents an embedded hardware architecture for real-time ultrasonic NDE applications that incorporate Hidden Markov Model (HMM) based statistical signal methods. Proposed algorithm is a combination of Discrete Wavelet Transform (DWT) for pre-processing A-scan signals and HMM for classification of the flaw presence. For this study, a MicroZed FPGA with Xilinx Zynq-7020 System-on-Chip (SoC)...
In this study, we will present a rule based fuzzy gesture recognition system where a user will interact with a spherical robot with hand gestures performed with a smart phone and the droid will respond by imitating this movements. In this context, we will take up the Gesture Recognition, Fuzzy Logic and Internet of Things (IoT) frameworks to construct such a Human-Machine Interface (HMI). In the proposed...
This paper provides a voice transformation model that uses pitch data and Feed-forward Neural Networks on Line Spectral Frequency. The aim of this work is to achieve the transformation of a speech signal produced by a source speaker by modifying voice individuality parameters such that it appears to be spoken by a chosen target speaker, without modifying the message contents. Most of the previous...
We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated...
With the incredible growth of OSNs (online social networks), users have numerous choices every moment. However, due to the limit of time and resources, only a small part of OSNs are chosen to remain social and active by users. The dynamic changes of users' interests entail user migration. Understanding user migration behavior is important to improve business intelligence and retain users. In this...
Cardiac Arrhythmia is a disease dealing with improper beating of heart. The improper condition may be fast beating or slow beating associated with heart. This paper proposes a detection or prediction scheme in the type of cardiac arrhythmia disease. It uses a clustering approach and regression methodology. The clustering approach used is DBSCAN and for regression, multiclass logistic regression is...
In this paper, we present a new method for detecting professional skills (as noun phrases) from resumes written in natural language. The proposed method uses an ontology of skills, the Wikipedia encyclopedia, and a set of standard multi word part-of-speech patterns in order to detect the professional skills. First, the method checks to see if there are, in the text of the resumes, skills that are...
Keystroke dynamics, which is a biometric characteristic that depends on typing style of users. In the past thirty years, dozens of classifiers have been proposed for distinguishing people using keystroke dynamics; many have obtained excellent results in evaluation. However, a more common case is that only normal instances are available and none of the rare classes are observed. It leads us to use...
Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of...
Text-to-speech (TTS) systems are often used as part of the user interface in wearable devices. Due to limited memory and computational/battery power in wearable devices, it could be useful to have a TTS system which requires less memory and is less computationally intensive. Conventional speech synthesis systems has separate modeling for pitch (FO-model) and spectral representation, namely Mel generalized...
This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs),...
There is a common observation that audio event classification is easier to deal with than detection. So far, this observation has been accepted as a fact and we lack of a careful analysis. In this paper, we reason the rationale behind this fact and, more importantly, leverage them to benefit the audio event detection task. We present an improved detection pipeline in which a verification step is appended...
Gesture recognition using a training set of limited size for a large vocabulary of gestures is a challenging problem in computer vision. With few examples per gesture class, researchers often employ state-of-the-art exemplar-based methods such as Dynamic Time Warping (DTW). This paper makes two contributions in the area of exemplar-based gesture recognition. As an alternative to DTW, we first introduce...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.