The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A Speech recognition is one of the important process of information technology. Speech recognition plays a key role in many systems like voice control, IP-telephony, personal identification, recognition of individual words and phrases, accepting applications for reference services and searching system. There are many researching companies in this area, which developing and improving methods, algorithms...
Pronunciation training aid using media tools such as mobile apps and online web-based system are widely used nowadays. These tools often provide audio-based sample and phonetic style texts that can be used to support the learners train their pronunciation without language teachers. However, the learners still have the difficulty in the learning process, because they found it is hard to detect and...
Recently, deep learning has been proposed and verified to possess the strong ability to learn and express complex features, which has brought significant research achievements in signal processing. As a challenging task in speech signal processing, monaural speech separation has always been the research focus of researchers. From the usage of traditional signal processing methods and shallow models...
The paper considers the task of improving the efficiency of voice control of robots on the basis of adaptive procedures. This problem is considered in the context of noise resistance processing of speech signals for voice recognition subsystems. A solution to this problem is found in classes of adaptive algorithms for filtering voice signals based on sequential filtering. This model allows to improve...
Speech analysis can be used for healthcare tasks such as pathology detection. Conventionally, a speech-language pathologist is specialized to detect anomalies from speech. Speech disorders result from a variety of causes such as brain injury, stroke, hearing loss, developmental delay or emotion alteration. Content of the speech is often not of interest for pathology detection, but characteristics...
In this paper, we propose a public-address system for broadcasting speech that are easy to understand. The system converts a broadcast speech into text using speech recognizer, and it converts the text into simple text using semantic parser. Then, having obtained text with a simple meaning, the system broadcasts it using speech synthesizer. Proposed semantic parser can use dependency analysis to appropriately...
With the increasing stress in working and studying, mental health becomes a major problem in the current social research. Generally, researchers can analyze psychological health states by using social perception behavior. The speech signal is an important research direction in this domain. It objectively assesses the mental health of social groups through the extraction and fusion of speech features...
This report discusses the implementation of a computerized algorithm specifically designed to measure the syllables-per-minute rate of abnormal speech typically produced by persons suffering from an articulatory disorder known as dysarthria. This speech rate measurement application — which can also serve as a diagnostic tool in itself — has been integrated into the computerised Frenchay Dysarthria...
Noise reduction algorithms for head-mounted assistive listening devices are crucial to improve speech quality and intelligibility in background noise. For binaural hearing devices with one microphone per device, the noise power spectral density (PSD) is commonly estimated using various assumptions about the acoustic scenario. Since these methods lack robustness if the underlying assumptions are not...
Speech emotion recognition has been widely used in human computer interaction and applications. This paper has classified emotion into two classes: happy and angry. All the speech signal is preprocessed from Malay spoken speech database. Emotional information is obtained by applying two well-established acoustical features that are Mel Frequency Cepstral Coefficients (MFCC) and Short Time Energy (STE)...
This paper presents a new technique using animated texts as the speech features' visualization medium for checking and detecting language learners' pronunciation. The proposed visualization tool will transform learners' speech features such as pitch, tempo or rhythm into animated texts form, and the mispronounce parts can be located by comparing them with the correct sample. In our previous experiments,...
The interests in Compressed Sensing (CS) come from its ability to provide sampling as well as compression, enhancement, along with encryption of the source information simultaneously. All these advantages have made CS, researched and applied in numerous speech-processing applications. In this paper, we compare ℓ1-minimization and Iteratively Reweighted Least Squares (IRLS)-ℓp-minimization algorithms...
Keyword extraction is widely used for information indexing, compressing, summarizing, etc. Existing keyword extraction techniques apply various text-based algorithms and metrics to locate the keywords. At the same time, some types of audio and audiovisual content, e. g. lectures, talks, interviews and other speech-oriented information, allow to perform keyword search by prosodic accents made by a...
The results of the implementation of an external accent recognition system and its integration into massive open online courses platform Moodle are reported. Accent recognition becomes important in foreign languages learning to provide a feedback to a student on a presence of a certain unwanted accent in a foreign language pronunciation. Implementation of several accent recognition methods and their...
There are many challenges in single-channel multi-person mixed speech separation, such as modeling the temporal continuity of the speech signals and improving the frame separation performance simultaneously. In this paper, a separation method based on Deep Clustering with local optimization by the improved Non-Negative Matrix Factorization (NMF) combined with Factorial Conditional Random Fields (FCRF)...
The research is aimed to present speech signal recognition and classification using the compressive sensing method to reduce the data from speech signal cross-correlation and to compare similarity and difference between the reference speech signal (the researcher's signal) and comparative signals. The speech signal reconstruction method is based on the solution to the underdetermined linear inverse...
Metric learning, one of the main topics of machine learning, is used to approximate similar data and to increase the distance between unrelated data in an existing space. With aiming the best solution for today's problems, setting a good metric for this would have a positive impact on performance. It has been benefited from a transformation matrix in metric learning. When we examine the studies in...
Accurate dialect identification technique helps in improving the speech recognition systems that exist in most of the present day electronic devices and is also expected to help in providing new services in the field of e-health and telemedicine which is especially important for older and homebound people. The accuracy of a dialect identification system is highly dependent on its speech corpora. Therefore,...
We discuss a method of signal analysis — empirical mode decomposition, and also its modification — complementary ensemble empirical mode decomposition. Both methods are used to research the reconstruction of a speech signal by the means of intrinsic mode functions that were received during the decomposition. Researches were performed using two English databases of speech signals which contain speech...
Spoken term detection (STD) is a fundamental part of some speech processing applications. One of STD methods uses a phoneme representation of words from a spoken content and a text query. The paper presents a new grapheme-to-phoneme conversion method based on high-order Markov chain. The method is applied to retrieve of spoken documents in Russian language. The aim of this research is evaluation effectiveness...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.