The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a novel speaker recognition framework that handles duration mismatch between registered and test utterances. The i-vectors extracted from short utterances exhibit high variance due to phoneme imbalance, which causes performance degradation in the duration mismatch condition. Most conventional methods attempt to decrease the variance by offsetting i-vectors or speaker similarity...
The present study examined the citation patterns of Mandarin tones in prelingual deaf adults with cochelar implants or hearing aids. The results showed that the participants tried to build up tonal pattern by exploring phonetic features such as creaky voice and tonal duration. The results also indicated that although the participants had problems distinguishing T2 from T3, T2 was harder than T3 for...
For effective pronunciation error detection for second language learners, we address articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without using such data, which is difficult to collect in a large scale, we propose a multi-lingual learning method, in which...
Residue Number Systems (RNSs) has been widely used in digital signal processing (DSP) systems and cases of fast computing, parallelism and fault tolerant because of its carry-free property. However, the comparison operation in an RNS is quite difficult and the computation cost is high, which are a significant limitation to apply it for division, scaling and overflow detection. Reverse conversions...
We present the AP16-OL7 database which was released as the training and test data for the oriental language recognition (OLR) challenge on APSIPA 2016. Based on the database, a baseline system was constructed on the basis of the i-vector model. We report the baseline results evaluated in various metrics defined by the AP16-OLR evaluation plan and demonstrate that AP16-OL7 is a reasonable data resource...
We present a scene depth map generation method based on light field cameras. From the plenoptic function, the angular information about each image point under different sizes of aperture is extracted, which could be used for confocal stereo. Considering confocal constancy and gradient constancy, we take into account two constraints: (1) When a pixel is in focus, its relative intensities across aperture...
Plenoptic cameras which can capture both spatial and angular light information by one shot have attracted great interests. Compared with traditional cameras, a plenoptic image has large resolution and enormous amounts of microlens images. Due to huge volume of data, an efficient plenoptic image compression method for transmission and storage is required. In this paper, a novel plenoptic image compression...
An HEVC format-compliant joint selective encryption and data embedding technique is proposed. The proposed technique is separable, where the decryption and data extraction processes are independent, with minimal parsing overhead. Specifically, elements in the HEVC coding structure are divided into two groups, where one group is manipulated to perceptually mask the video content, while another is modified...
Due to the trade-off between spatial and angular resolution, the effective spatial resolution of a light field image is usually less than one percent of the number of pixels on the photo sensor. In this paper, we propose a prototype algorithm to upsample a light field image. Because the boundary edges of 3D objects would result in lines on epipolar plane images (EPIs), the main idea of our method...
By analyzing the theory of functional link artificial neural network (FLANN) structure based on filtered-s least mean square (FSLMS) algorithm which is usually used in the nonlinear active noise control (NANC) system, it can be found that the controller coefficients of nonlinear parts are multiple related, this problem causes much unbalance to calculate these coefficients and restraints the performance...
In this paper, we propose a semi-global matching method based on image segmentation. We perform a k-means clustering algorithm in only left image as image segmentation. Then, to improve result of image segmentation, we integrate adjacent and small labels along edges of objects. After that, we extract feature points to estimate the disparity range in each label, and add weights to the disparity range...
In order to avoid the multi-dimensional spectrum peak search for multiple parameters estimation, a hybrid algorithm with uniform circular array (UCA) of electromagnetic vector sensors based on the beamspace transformation is proposed. In the beamspace, the azimuth angle can be split from other parameters and can be estimated without using spectral peak search. Then, the elevation estimation can be...
In this paper, we develop an algorithm for depth image super-resolution from RGB-D images, which are acquired under different imaging conditions so that we can combine them to improve the image quality with precise 3D registration. We focus on how to increase the resolution and quality of depth images by combining multiple RGB-D images and using the deep learning technique. In the proposed solution,...
This paper reports on the construction of a multi-modal Mandarin-Tibetan speech database collected from native speakers of WeiZang dialect. The Mandarin-Tibetan corpus contains 41 Tibetan sentences, 27 Chinese sentences, 30 Tibetan consonants, 4 Tibetan vowels, and 25 Tibetan monosyllables. A multi-modal data collection system was established, which comprises an ultrasound scanner, high-speed camera,...
As the annoying blocking or ghost artifacts tend to appear in the conventional compression approaches either in the JPEG or JPEG2000 standards at low bitrate, the concept of the object-oriented image compression is proposed. This kind of methods is able to retain the image structural boundaries and therefore has relatively good visual qualities even in high compression ratios. In this paper, we propose...
Time-of-Flight (ToF) cameras are easily accessible in this era. They capture real distances of objects in a controlled environment. Yet, the ToF image may include disconnected boundaries between objects. In addition, certain objects are not capable of reflecting the infrared ray such as black hair. Such problems are caused by the physics of ToF. This paper proposes a method to compensate such errors...
Traditional theories and methods in 3D reconstruction were all proposed with implicit assumption of air environment. However, underwater environment is different in many aspects. The absorption and scattering effects caused by the suspended particles in the water attenuate the image signal, which disqualifies the traditional reconstruction algorithms. In this paper, we propose a novel method to reconstruct...
In this paper, we propose a simple calibration method for an ad hoc microphone array that utilizes sound emissions. Assuming that each device has a function to record sound while emitting another sound, the location, the recording time offset, and the sampling frequency mismatch of each device are estimated from the time of arrival (TOA) of the sound emitted by each device. The accurate estimation...
To meet a wide range of needs for video applications such as remote desktop, video conference, distance education, and cloud gaming, the ISO/ITU Joint Collaborative Team on Video Coding (JCT-VC) committee is recently specifying the Screen Content Coding (SCC) standard, as one of the extensions of High Efficiency Video Coding (HEVC). In this paper, the hash search method of the standard adopted Intra...
We develop a brain-machine interface for the hand-motor rehabilitation of stroke patients. The interface provides both visual and proprioceptive feedback to the user based upon the successful generation of cortical motor commands. We discuss the details of the proposed system and provide a summary of the preliminary experiment. The experiment investigates the importance of simultaneous visual and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.