Wyniki wyszukiwania

rozdział

Applications of deep learning in supervised speech separation

Shuangran Bai, Yungang Liu, Ting Zhang, Fengzhong Li

2017 Chinese Automation Congress (CAC) > 6539 - 6544

2017 Chinese Automation Congress (CAC)

Recently, deep learning has been proposed and verified to possess the strong ability to learn and express complex features, which has brought significant research achievements in signal processing. As a challenging task in speech signal processing, monaural speech separation has always been the research focus of researchers. From the usage of traditional signal processing methods and shallow models...

rozdział

Implementation of accent recognition methods subsystem for eLearning systems

Eugen Tverdokhleb, Hennadii Dobrovolskyi, Nataliya Keberle, Natalia Myronova

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 2 > 1037 - 1041

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

The results of the implementation of an external accent recognition system and its integration into massive open online courses platform Moodle are reported. Accent recognition becomes important in foreign languages learning to provide a feedback to a student on a presence of a certain unwanted accent in a foreign language pronunciation. Implementation of several accent recognition methods and their...

rozdział

SPEED: Open-Source Framework to Accelerate Speech Recognition on Embedded GPUs

Syed Mohammad Asad Hassan Jafri, Ahmed Hemani, Leonardo Intesa

2017 Euromicro Conference on Digital System Design (DSD) > 94 - 101

2017 Euromicro Conference on Digital System Design (DSD)

Due to high accuracy, inherent redundancy, and embarrassingly parallel nature, the neural networks are fast becoming mainstream machine learning algorithms. However, these advantages come at the cost of high memory and processing requirements (that can be met by either GPUs, FPGAs or ASICs). For embedded systems, the requirements are particularly challenging because of stiff power and timing budgets...

rozdział

Investigative study of various activation functions for speech recognition

Hari Krishna Vydana, Anil Kumar Vuppala

2017 Twenty-third National Conference on Communications (NCC) > 1 - 5

2017 Twenty-third National Conference on Communications (NCC)

Significant developments in deep learning methods have been achieved with the capability to train more deeper networks. The performance of speech recognition system has been greatly improved by the use of deep learning techniques. Most of the developments in deep learning are associated with the development of new activation functions and the corresponding initializations. The development of Rectified...

rozdział

Deep attractor network for single-microphone speaker separation

Zhuo Chen, Yi Luo, Nima Mesgarani

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 246 - 250

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in...

rozdział

A first look into a Convolutional Neural Network for speech emotion detection

Dario Bertero, Pascale Fung

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5115 - 5119

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a real-time Convolutional Neural Network model for speech emotion detection. Our model is trained from raw audio on a small dataset of TED talks speech data, manually annotated into three emotion classes: “Angry”, “Happy” and “Sad”. It achieves an average accuracy of 66.1%, 5% higher than a feature-based SVM baseline, with an evaluation time of few hundred milliseconds. We also provide...

rozdział

Using regional saliency for speech emotion recognition

Zakaria Aldeneh, Emily Mower Provost

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2741 - 2745

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we show that convolutional neural networks can be directly applied to temporal low-level acoustic features to identify emotionally salient regions without the need for defining or applying utterance-level statistics. We show how a convolutional neural network can be applied to minimally hand-engineered features to obtain competitive results on the IEMOCAP and MSP-IMPROV datasets. In...

rozdział

Ranking emotional attributes with deep neural networks

Srinivas Parthasarathy, Reza Lotfian, Carlos Busso

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4995 - 4999

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Studies have shown that ranking emotional attributes through preference learning methods has significant advantages over conventional emotional classification/regression frameworks. Preference learning is particularly appealing for retrieval tasks, where the goal is to identify speech conveying target emotional behaviors (e.g., positive samples with low arousal). With recent advances in deep neural...

artykuł

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music

Yoonchang Han, Jaehun Kim, Kyogu Lee

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2017 > 25 > 1 > 208 - 221

Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world polyphonic...

rozdział

Efficient audio segmentation in soccer videos

M A Raghuram, Nikhil R. Chavan, Shashidhar G. Koolagudi, Pravin B. Ramteke

2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) > 1 - 4

2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)

Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses...

rozdział

Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding - an overview -

Inge Gavat, Diana Militaru

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 8

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

This paper will discuss the progress made in Automatic Speech Recognition and Understanding (ASRU) by applying Deep Learning (DL) in the frame of acoustic modeling. After explaining the concept of DL, specific algorithms like Restricted Bolzmann Machine (RBM), Convolutional Neural Network (CNN), Autoencoder (AE), Deep Belief Network (DBN), will be presented and evaluated. Experiments in the academic...

rozdział

A learning-based approach to direction of arrival estimation in noisy and reverberant environments

Xiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L. Jones, więcej

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2814 - 2818

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a learning-based approach to the task of direction of arrival estimation (DOA) from microphone array input. Traditional signal processing methods such as the classic least square (LS) method rely on strong assumptions on signal models and accurate estimations of time delay of arrival (TDOA) . They only work well in relatively clean conditions, but suffer from noise and reverberation...

rozdział

A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs)

Burcu Can, Harun Artuner

2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR) > 219 - 224

2013 International Conference of Soft Computing and Pattern Recognition (SoCPaR)

In this paper, we present a model for Turkish speech recognition. The model is syllable-based, where the recognition is performed through syllables as speech recognition units. The main goal of the model is to recognize as much as possible of a given continuous speech by identifying only a small set of syllables in the language. For that purpose, only the syllable types with a higher frequency are...

rozdział

Photonic reservoir computing: A new approach to optical information processing

K Vandoorne, M Fiers, D Verstraeten, B Schrauwen, więcej

2010 12th International Conference on Transparent Optical Networks > 1 - 4

2010 12th International Conference on Transparent Optical Networks (ICTON 2010)

Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks and has been successfully used in several pattern classification problems, like speech and image recognition...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Applications of deep learning in supervised speech separation

Implementation of accent recognition methods subsystem for eLearning systems

SPEED: Open-Source Framework to Accelerate Speech Recognition on Embedded GPUs

Investigative study of various activation functions for speech recognition

Deep attractor network for single-microphone speaker separation

A first look into a Convolutional Neural Network for speech emotion detection

Using regional saliency for speech emotion recognition

Ranking emotional attributes with deep neural networks

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music

Efficient audio segmentation in soccer videos

Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding - an overview -

A learning-based approach to direction of arrival estimation in noisy and reverberant environments

A syllable-based Turkish speech recognition system by using time delay neural networks (TDNNs)

Photonic reservoir computing: A new approach to optical information processing

Opcje filtrowania

Data publikacji

Typ publikacji

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Typ publikacji

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu