The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Short Message Service (SMS) spam is a serious problem in Vietnam because of the availability of very cheap prepaid SMS packages. There are some systems to detect and filter spam messages for English, most of which use machine learning techniques to analyze the content of messages and classify them. For Vietnamese, there is some research on spam email filtering but none focused on SMS. In this work,...
In this paper, we study language models based on recurrent neural networks on three databases in two languages. We implement basic recurrent neural networks (RNN) and refined RNNs with long short-term memory (LSTM) cells. We use the corpora of Penn Tree Bank (PTB) and AMI in English, and the Academia Sinica Balanced Corpus (ASBC) in Chinese. On ASBC, we investigate word-based and character-based language...
We have investigated the effect of normalizing Japanese orthographical variants into a uniform orthography on statistical machine translation (SMT) between Japanese and English. In Japanese, 10% of words have reportedly more than one orthographical variants, which is a promising fact for improving translation quality when we normalize these orthographical variants. However, the results show that SMT...
Sentiment analysis is an important task in natural language processing and computational linguistics. Automatic sentiment analysis has been widely applied to opinion reviews and social media for a variety of applications, such as marketing and customer services. The dimensional approach can provide more fine-grained sentiment analysis in which each vocabulary is assigned two continuous numerical values...
Reading ability is one of the most important skills to language learners. Grade-level reading corpus can be more targeted to improve learners' reading abilities. Based on the Corpus of Teaching Chinese as a Second Language (CTC), this paper presents a grade standard for the construction of a grade-level reading corpus. The corpus is tagged with linguistic information, and it can be used as a language...
This paper proposed an approach to estimate the DSAW-defined valence of words. The valence is converted from sentimental polarity and strength of each word in the CSWN, according to the training data provided by DSAW. In addition, this paper also proposed a method to estimate the arousal for each word based on an observation: increasingly more positive and negative sentiments suggest a stronger arousal...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.