The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid development of the question and answer services based on community, like Sina Ask, Baidu Zhidao and Yahoo! The Community-based Question Answering service has been became a new knowledge-sharing model with characteristics of interactivity and openness. The community sites provide high quality service to meet clients' need and attract them actively participation. In order to accurately...
Individuals, criminals or even terrorist organizations can use web-communication for criminal purposes; to avoid the prosecution they try to hide their identity. To increase level of safety in Web we have to improve the author (or web-user) identification and authentication procedures. In field of web author identification the situation of imbalanced data sets appears rather frequent, when number...
With the development of weblogs and social networks, many news providers share their news headlines on different websites and weblogs. One of the main text mining topics is how to classify news into different groups. This study aims to classify news into various groups so that users can identify the most popular news group in the desired country at any given time. Based on Term Frequency-Inverse Document...
Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. It is an important problem in Data mining. Due to the exponential growth of documents in the Internet and the emergent need to organize them, developing an efficient document classification method to automatically manipulate web...
Classification systems adapts many machine learning techniques for quality performance in data classification. The neural networks has some unique characteristics and features which can handle high dimensional features and documents with noise and contradictory data. Classification is important to classify the input text into different domains appropriately. This paper give out a move towards classification...
Text classification is the most important research issues in the field of data mining. The main idea of using the stemming technique is to reduce the number of features that can be extracted from the document. Furthermore, the stemming aims to enhance the accuracy of the classifier. This paper aims to study the effectiveness of using stemming techniques. The paper will use two popular word extractions:...
In the current era, there is a high demand of accurate text identification and categorization methods in N - Lingual non-scanned and scanned machine printed documents, where N represents mono, bi, tri or multi mode. In this paper, a technical study and analysis is presented to show N-lingual document classification for normal text, printed and handwritten documents. Text classification for normal...
The significant growth of online textual information has increased the demand for effective content-based Arabic text categorization methods. The categorization of Arabic texts has some challenges that need to be addressed specially when using stemming. In the literature, we found a debate among researchers about the benefits of using stemming in Arabic text categorization. Hence, we performed a study...
Social media such as Twitter create space to explain the thoughts and opinions on various topics and different events, millions of users can share their ideas in this Micrblog, Therefore Twitter is converted as a source to exploration of information; make a decision and an analysis of sentiment. There is a sense in all of the texts, but it is more important to provide strategies for obtaining suitable...
Text categorization with machine learning algorithms usually assumes to have flat set of categories. Such classifiers are very domain specific and not reusable for some other generic text classifications. It is very possible that a hierarchically structured set of categories might have a higher impact on the way classifiers are used and built. As presented in this document, the list of most common...
In today's world, many real world examples are based on multi label classification. A single document may belong to a set of class labels simultaneously. The process of ranking i.e. strict ordering of class labels is of great concern here. We have used the concept of quantifiers for ranking of class labels. We have proposed eight new quantifiers, which calculate the degree of membership of class labels...
Identification of students' cognitive ability should be done to know students' understanding towards what have been taught. The identification result will be the benchmark to choose the basis of assessment. The identification process of cognitive ability can be done by giving questions in certain difficulties levels. The appropriateness of difficulty levels can be made based on bloom taxonomy introduced...
Feature selection plays an important role in text categorization, and contributes directly to the accuracy of the categorization. In the process of feature selection, due to the lack of consideration of the traditional expected cross entropy algorithm for document frequency, we first improve the expected cross entropy formula of the traditional, and then propose an improved text feature selection...
Text feature selection is the key technology in text classification and text information retrieval. The feature selection method - information gain - has extensive application in text categorization. This paper theoretically analyzed the deficiency of information gain in feature selection methods, and then introduced two improvement factors which were LDFWF (Limiting Document Frequency's Word Frequency)...
Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method...
Text classification is one of the key methods used in text mining. Generally, traditional classification algorithms from machine learning field are used in text classification. These algorithms are primarily designed for structured data. In this paper, we propose a new classifier for textual data, called Supervised Meaning Classifier (SMC). The new SMC classifier uses meaning measure, which is based...
Feature selection is a strategy that aims at making text classifiers more efficient and accurate. In this paper, we proposed a novel feature selection method based on Tibetan grammar for Tibetan classification. Tibetan language express grammatical meaning through the function words and word order, and the function word has large proportions. By analyzing the Tibetan grammar and distribution of part...
The common classification is conducted under the supervised learning algorithms, which design classifiers through learning the labeled training samples. However, in actual situations, it is very costly to acquire class-labeled samples, because manually labeling documents requires a lot of time and efforts from experts. Therefore, it restrains the text classification to a great extent. To solve the...
Understanding Web users' search intent expressed by their queries is essential for a search engine to provide the appropriate answers. Web query classification (QC) algorithms have been widely studied to improve the accuracy and meet users' demands. Some QC algorithms convert queries into vectors and use SVM or CRF model as the classifier. However, with the volume of data increasing, the time consumed...
The given paper describes modern approach to the task of sentiment analysis of movie reviews by using deep learning recurrent neural networks and decision trees. These methods are based on statistical models, which are in a nutshell of machine learning algorithms. The fertile area of research is the application of Google's algorithm Word2Vec presented by Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.