In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach to implement word embedding method to conduct sentiment analysis on movie review from a renowned bulletin board system forum in Taiwan. After performing log-likelihood ratio (LLR) on the corpus and selecting the top 10000 most related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also tops the original LLR with a substantial margin.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.