Sentiment analysis is the task of identifying the polarity and subjectivity of documents using a combination of machine learning, information retrieval, and natural language processing techniques. The problem is studied within the scope of statistical machine learning. Different feature selection methods, dimensionality reduction algorithms and classification techniques are investigated and compared. The main focus of this work is on finding the factors that affect the accuracy of learnt models. Extensive statistical analysis is performed to identify the best algorithmic configurations. Moreover, a novel approach is introduced based on long short‐term memory recurrent neural network language models that do not require any special preprocessing or feature selection. Finally, benchmark results are presented on seven well‐known datasets from different domains. WIREs Data Mining Knowl Discov 2015, 5:255–263. doi: 10.1002/widm.1159
This article is categorized under:
- Algorithmic Development > Text Mining