The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present a topic mixture language modeling approach making use of the soft classification notion of topic models. Given a text document set, we first perform document soft classification by applying a topic modeling process such as probabilistic latent semantic analyses (PLSA) or latent Dirichlet allocation (LDA) on the dataset. Then we can derive topic-specific n-gram counts from the classified...
We present a semi-supervised learning (SSL) method for building domain-specific language models (LMs) from general-domain data using probabilistic latent semantic analysis (PLSA). The proposed technique first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data. Then it derives latent topic distribution of the interested domain, and derives domain-specific...
Natural language interface is an important research topic in the area of natural language processing (NLP). Natural language interaction with robot could be the most natural and efficient way. In order to build speech enabled human language interface of robots, our research goal is to study the problems in this area and develop technologies that can potentially improve human-robot interaction. In...
We present a semi-supervised learning method for building domain-specific language models (LM) from general-domain data. This method is aimed to use small amount of domain-specific data as seeds to tap domain-specific resources residing in larger amount of general-domain data with the help of topic modeling technologies. The proposed algorithm first performs topic decomposition (TD) on the combined...
In this paper, we propose a method to extend the use of latent topics into higher order n-gram models. In training, the parameters of higher order n-gram models are estimated using discounted average counts derived from the application of probabilistic latent semantic analysis(PLSA) models on n-gram counts in training corpus. In decoding, a simple yet efficient topic prediction method is introduced...
The n-gram language model adaptation is typically formulated using deleted interpolation under the maximum likelihood estimation framework. This paper proposes a Bayesian learning framework for n-gram statistical language model training and adaptation. By introducing a Dirichlet conjugate prior to the n-gram parameters, we formulate the deleted interpolation under maximum a posterior criterion with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.