2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Items from 1 to 20 out of 50 results

chapter

Novel self-portrait enhancement via multi-photo fusing

Sifeng Xia, Shuai Yang, Jiaying Liu, Zongming Guo

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we present a novel multi-photo-based framework to solve a self-portrait enhancement problem we call “1+2 problem”, in which a self-portrait photo is enhanced with the help of two multiple photos that share the same scene and similar shooting time. The key idea is to exploit the extra information of these two photos to overcome the limited field of view and poor illumination of the target...

chapter

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features

Jiangyan Yi, Hao Ni, Zhengqi Wen, Jianhua Tao

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes an approach to perform accent adaptation by using accent dependent bottleneck (BN) features to improve the performance of multi-accent Mandarin speech recognition system. The architecture of the adaptation uses two neural networks. First, deep neural network (DNN) acoustic model acts as a feature extractor which is used to extract accent dependent BN (BN-DNN) features. The input...

chapter

Multi-feature based score fusion method for fingerprint recognition accuracy boosting

Qiongxiu Li, Changlong Jin, Weonjin Kim, Jungmin Kim, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In fingerprint recognition system, minutiae-based matching algorithms are most intensively researched. However, in most minutia-based methods, the similarity score is given based on the main score of matched minutiae. And the boosted information is not effectively used in the final similarity score computation. Based on the observation, we extract several features as the supplementary scores. And...

chapter

Efficient deep neural networks for speech synthesis using bottleneck features

Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database...

chapter

Beamforming networks using spatial covariance features for far-field speech recognition

Xiong Xiao, Shinji Watanabe, Eng Siong Chng, Haizhou Li

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Recently, a deep beamforming (BF) network was proposed to predict BF weights from phase-carrying features, such as generalized cross correlation (GCC). The BF network is trained jointly with the acoustic model to minimize automatic speech recognition (ASR) cost function. In this paper, we propose to replace GCC with features derived from input signals' spatial covariance matrices (SCM), which contain...

chapter

Mining user interests from social media by fusing textual and visual features

Fang-Yu Chao, Jia Xu, Chia-Wen Lin

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 8

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we propose a framework that fuses textual and visual features of user generated social media data to mine the distribution of user interests. The proposed framework consists of three steps: feature extraction, model training, and user interest mining. We choose boards from popular users on Pinterest to collect training and test data. For each pin we extract the term-document matrices...

chapter

Deep neural network based voice conversion with a large synthesized parallel corpus

Zhengqi Wen, Kehuang Li, Jianhua Tao, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log...

chapter

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

Terumi Umematsu, Shuji Komeiji, Masanori Tsujikawa, Ryosuke Isotani

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a sudden-noise suppression method for speech recognition using a phase linearity feature for noise detection. Our investigation of sound data recorded in actual retail stores shows that short, sudden noises are dominant in such environments. We also confirm the negative effect of such noises on speech recognition performance. Our method addresses this problem by focusing on sudden noises...

chapter

Boosting DNN-based speech enhancement via explicit transformations

Qing Wang, Jun Du, Li-Rong Dai

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this study, we investigate on the learning behaviors of DNN by explicit feature transformations. As a demonstration, linear and logarithm transformations, corresponding to the amplitude spectra and log-power spectra, are compared with the same minimum mean squared error (MMSE) objective function for optimizing DNN parameters. Based on the experimental analysis of the DNN learning behaviors, we...

chapter

Disparity Map estimation using semi-global matching based on image segmentation

Eunsang Ko, Yo-Sung Ho

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we propose a semi-global matching method based on image segmentation. We perform a k-means clustering algorithm in only left image as image segmentation. Then, to improve result of image segmentation, we integrate adjacent and small labels along edges of objects. After that, we extract feature points to estimate the disparity range in each label, and add weights to the disparity range...

chapter

Depth image super-resolution via multi-frame registration and deep learning

Ching Wei Tseng, Hong-Ren Su, Shang-Hong Lai, JenChi Liu

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 8

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we develop an algorithm for depth image super-resolution from RGB-D images, which are acquired under different imaging conditions so that we can combine them to improve the image quality with precise 3D registration. We focus on how to increase the resolution and quality of depth images by combining multiple RGB-D images and using the deep learning technique. In the proposed solution,...

chapter

DNN-based voice activity detection with local feature shift technique

Tae Gyoon Kang, Kang Hyun Lee, Woo Hyun Kang, Soo Hyun Bae, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Recently, the deep neural networks (DNNs) are successfully adopted into the voice activity detection (VAD) area. However, the performance of the DNN-based VAD is still unsatisfactory in noise environments where the feature subspace of the training database and the test environments are not matched with each other. In this paper, we propose a local feature shift technique which normalizes the feature...

chapter

Dynamic convolutional neural network for activity recognition

Chih-Hsiang You, Chen-Kuo Chiang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, a novel Dynamic Convolutional Neural Network (D-CNN) is proposed using sensor data for activity recognition. Sensor data collected for activity recognition is usually not well-aligned. It may also contains noises and variations from different persons. To overcome these challenges, Gaussian Mixture Models (GMM) is exploited to capture the distribution of each activity. Then, sensor data...

chapter

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Zhen Wei, Zhizheng Wu, Lei Xie

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement...

chapter

Speech emotion classification using multiple kernel Gaussian process

Sih-Huei Chen, Jia-Ching Wang, Wen-Chi Hsieh, Yu-Hao Chin, more

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely...

chapter

On the use of I-vectors and average voice model for voice conversion without parallel data

Jie Wu, Zhizheng Wu, Lei Xie

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications,...

chapter

KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space

Wenping Hu, Frank K Soong

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 6

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose to detect mispronunciations in a language learners speech via a discriminatively trained DNN in the phonetic space. The posterior probabilities of “senones” populated in a decision tree are trained and predicted speaker independently. Acoustic features of each input segment (with preceding and succeeding contexts of several frames) are mapped unto the whole set of senones in their corresponding...

chapter

Feature transformation for speaker verification under speaking rate mismatch condition

Askar Rozi, Lantian Li, Dong Wang, Thomas Fang Zheng

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speaker verification suffers from serious performance degradation under speaking rate mismatch condition. This degradation can be largely attributed to the spectrum distortion caused by different speaking rates. This paper proposes a feature transform approach which projects speech features in slow speaking rates to features in normal speaking rates. The feature space maximum likelihood linear regression...

chapter

Investigation of glottal features and annotation procedures for speech emotion recognition

Masaaki Takebe, Kazumasa Yamamoto, Seiichi Nakagawa

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Speech emotion recognition is a still challenging problem despite having been investigated over the last couple of decades. Conventional speech emotion recognition performance is low, but this may be improved by considering new features and an annotation method. In this paper, firstly we use glottal features for speech emotion recognition to improve its performance because the emotions are related...

chapter

An enhanced multi-view human action recognition system for virtual training simulator

Beom Kwon, Junghwan Kim, Sanghoon Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Virtual military training systems have received considerable attention as a possible substitute for conventional real military training. In our previous work, human action recognition system using multiple Kinects (HARS-MK) has been implemented as a prototype of virtual military training simulator. However, the classification accuracy of HARS-MK is not enough to be utilized for virtual military training...

Keywords:
FEATURE EXTRACTION

Publication date

Set your own date range

Keywords

SPEECH (24)
TRAINING (18)
ACOUSTICS (9)
DATABASES (9)
HIDDEN MARKOV MODELS (8)
SPEECH RECOGNITION (7)
CONVOLUTION (4)
DISTORTION (4)
ESTIMATION (4)
HISTOGRAMS (4)
IMAGE COLOR ANALYSIS (4)
NOISE MEASUREMENT (4)
SUPPORT VECTOR MACHINES (4)
TRAINING DATA (4)
TRANSFORMS (4)
ADAPTATION MODELS (3)
COMPUTATIONAL MODELING (3)
COVARIANCE MATRICES (3)
DATA MODELS (3)
FACE RECOGNITION (3)
IMAGE CODING (3)
IMAGE EDGE DETECTION (3)
IMAGE RESOLUTION (3)
KERNEL (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
NEURAL NETWORKS (3)
ROBUSTNESS (3)
SIGNAL TO NOISE RATIO (3)
SPEECH ENHANCEMENT (3)
SPEECH PROCESSING (3)
SPEECH SYNTHESIS (3)
AUTHENTICATION (2)
CAMERAS (2)
CONTEXT (2)
CORRELATION (2)
DATA MINING (2)
DECODING (2)
DEEP NEURAL NETWORK (2)
DICTIONARIES (2)
EMOTION RECOGNITION (2)
FINGERPRINT RECOGNITION (2)
I-VECTOR (2)
IMAGE SEGMENTATION (2)
LIGHTING (2)
MACHINE LEARNING (2)
MACHINE LEARNING ALGORITHMS (2)
MICROPHONES (2)
PRAGMATICS (2)
PREDICTIVE MODELS (2)
STANDARDS (2)
SYSTEM PERFORMANCE (2)
THREE-DIMENSIONAL DISPLAYS (2)
TRANSFORM CODING (2)
VISUALIZATION (2)
VOCODERS (2)
ACCELERATION (1)
ACOUSTIC DISTORTION (1)
ACTIVITY RECOGNITION (1)
ADAPTATION (1)
AEROSPACE ELECTRONICS (1)
ALARM SYSTEMS (1)
ARRAY SIGNAL PROCESSING (1)
ARTICULATORY MOVEMENT PREDICTION (1)
ARTIFICIAL NEURAL NETWORKS (1)
AVERAGE VOICE MODEL (1)
BIOMEDICAL MONITORING (1)
BIT RATE (1)
BUILDINGS (1)
CEPSTRUM (1)
CLUSTERING ALGORITHMS (1)
COGNITIVE RADIO (1)
COLOR (1)
COMPLEXITY THEORY (1)
COMPUTER ASSISTED LANGUAGE LEARNING (CALL) (1)
CONTAINERS (1)
CORRELATION COEFFICIENT (1)
COST FUNCTION (1)
DECISION TREES (1)
DEGRADATION (1)
DIGITAL IMAGES (1)
DYNAMIC PROGRAMMING (1)
ELBOW (1)
ELECTRONIC MAIL (1)
ENCODING (1)
ENTROPY (1)
ERROR CORRECTION (1)
FACE (1)
FEATURE SELECTION (1)
FOOTWEAR (1)
FORGERY (1)
FREQUENCY-DOMAIN ANALYSIS (1)
FUSES (1)
GABOR FILTERS (1)
GAIT RECOGNITION (1)
GAUSSIAN PROCESSES (1)
GRAVITY (1)
HEART RATE VARIABILITY (1)
HIGH EFFICIENCY VIDEO CODING (1)
HIP (1)
more

INFONA - science communication portal

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Novel self-portrait enhancement via multi-photo fusing

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features

Multi-feature based score fusion method for fingerprint recognition accuracy boosting

Efficient deep neural networks for speech synthesis using bottleneck features

Beamforming networks using spatial covariance features for far-field speech recognition

Mining user interests from social media by fusing textual and visual features

Deep neural network based voice conversion with a large synthesized parallel corpus

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

Boosting DNN-based speech enhancement via explicit transformations

Disparity Map estimation using semi-global matching based on image segmentation

Depth image super-resolution via multi-frame registration and deep learning

DNN-based voice activity detection with local feature shift technique

Dynamic convolutional neural network for activity recognition

Predicting articulatory movement from text using deep architecture with stacked bottleneck features

Speech emotion classification using multiple kernel Gaussian process

On the use of I-vectors and average voice model for voice conversion without parallel data

KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space

Feature transformation for speaker verification under speaking rate mismatch condition

Investigation of glottal features and annotation procedures for speech emotion recognition

An enhanced multi-view human action recognition system for virtual training simulator

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)