The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Twitter is one of world most famous social media. There are many statement expresed in Twitter like happiness, sadness, public information, etc. Unfortunately, people may got angry to each other and write it down as a tweet on Twitter. Some tweet may contain Indonesian swear words. It's serious problem because many Indonesians may not tolerated swear words. Some Indonesian swear words may have multiple...
This paper shows a simple approach for fake news detection using naive Bayes classifier. This approach was implemented as a software system and tested against a data set of Facebook news posts. We achieved classification accuracy of approximately 74% on the test set which is a decent result considering the relative simplicity of the model. This results may be improved in several ways, that are described...
Education can be utilized as a tool to face many problems, overcome many hurdles in life. The knowledge obtained from education helps to enhance opportunities in one's employment development. To extract useful information from the knowledge obtained, Educational Data Mining is widely used. Educational data mining provides the process of applying different data mining tools and techniques to analyze...
Correct and fast sentiment analysis of continuously generated data such as Twitter message is very important for providing real-time customized service to the users. While Naive Bayes Classifier(NBC) is the most popular classifier employed for sentiment analysis, the existing studies on it have been based on single server environment. Consequently, they are not adequate for handling real-time stream...
The advent of Social Medias, Email services and other internet facilities are found helpful for a wide range of users. But some of them are interested in finding loop holes in such web based services to hinder the normal activities of common users. In this, spam Emails are one of the most disturbing activity in social network. In this context there is a need for efficient spam filters and most of...
In Data Mining classification plays prominent role in predicting outcomes. One of the best supervised classification techniques in Data Mining is Naive Bayes Classification. Naive Bayes Classification is good at predicting outcomes and often outperforms other classification techniques. One of the reasons behind the strong performance of Naive Bayes Classification is due to the assumption of conditional...
Diabetes mellitus is considered to be a severe health issue which is caused due to the presence of higher amount of plasma/glucose in the blood. A number of decision support systems were introduced to help medical experts for analyzing different factors that cause diabetes. Here a computerized information system is designed using Stacked Generalization for predicting diabetes. The classifiers under...
Text Categorization plays an important role in the fields of information retrieval, machine learning, natural language processing, data mining and others. With the development of computer and information technology, there have been many classification algorithms. Each text classification algorithms will get result at differing speeds and efficiency due to the various feature of test text. It has been...
In recent years, RESTful Web services have been rapidly developed and deployed, because of the advantages of lightweight, flexibility and extensibility, etc. However, most RESTful services are described in heterogeneous and ordinary HTML pages, which makes them really difficult to be identified and crawled automatically from the Internet. In this paper we propose a hybrid classifier framework called...
Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable...
Geometric dilution of precision (GDOP) is a powerful, simple and widely used measure for assessing the effectiveness of potential measurements to specify the precision and accuracy of the data received from global positioning system (GPS) satellites. The most correct method to classify or approximate the GPS GDOP is to use inverse matrix on all the combinations and choosing the lowest one, but inversing...
A new statistical pattern classifying system is proposed to solve the problem of the "peaking phenomenon". In this phenomenon, the accuracy of a pattern classifier peaks as the features increase under a fixed size of training samples. Instead of estimating the distribution of class objects, the system generates a region on the feature space, in which a certain rate of class objects is included...
In this paper, we describe a physical activity classification system using a body sensor network (BSN) consisting of cost-sensitive tri-axial accelerometers. We focus on workspace activities (different motions and sitting postures). We use a Naive Bayes classifier and show that we can train the system simply and systematically. For each task, we find a set of features that separate the corresponding...
In text mining field, The KNN (K Nearest Neighbors) is one of the oldest and simplest methods of text classification. But it is known to be sensitive to the distance (or similarity) function used in classifying a test instance, this disadvantage can cause low classification accuracy and limit the KNN classifier's utilization in text classification in text mining. In this paper, we introduce Mahalanobis...
As an integral part of reliable communication in wireless networks, effective link estimation is essential for routing protocols. However, due to the dynamic nature of wireless channels, accurate link quality estimation remains a challenging task. In this paper, we propose 4C, a novel link estimator that applies link quality prediction along with link estimation. Our approach is data-driven and consists...
One of the important stages for optical character recognition system is text components segmentation from non-text components of input images. In this paper a machine learning technique based on a naive bayes classifier is developed for text components segmentation. In training stage, a simple procedure is used to generate a large collection of training data sets for learning the classifier. A collection...
Naïve Bayes classifier is proved to be one of the most effective classifier an be used widely. It applies statistical theory to text classification. This paper researched and implemented a Chinese text classifier using JAVA base on Naïve Bayes Method. First of all, this paper described test classification system, the content includes text information expressing, extracting and the method of Chinese...
Port-based or payload-based analysis is becoming difficult for accurate traffic identification when many applications use dynamic port numbers and encryption to avoid detection. In this paper we present an approach for online traffic classification relying on the observation of the first n packets of a flow. The packet size and inter-arrival times of the individual packets, rather than the statistic...
The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially...
Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.