The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Classification is the one of the most important techniques in Datamining for data analysis. In Datamining, different Classification Techniques are available to predict outcome for a given dataset. There are many classification techniques for predicting and estimating accuracy, one such famous technique is Naïve Bayes Classifier. Naïve Bayes is very popular as it is easy to build, not so complex and...
In Data Mining classification plays prominent role in predicting outcomes. One of the best supervised classification techniques in Data Mining is Naive Bayes Classification. Naive Bayes Classification is good at predicting outcomes and often outperforms other classification techniques. One of the reasons behind the strong performance of Naive Bayes Classification is due to the assumption of conditional...
As a product of Web2.0, micro-blog is developing rapidly these years. More and more information spread on the micro-blog because of its high speed and convenience, social hotspots and news events included. As a result, discovering, extraction and analyzing information become researching hotspots. By studying micro-blog text and long text cluster, this article draws a conclusion that traditional cluster...
Due to the proliferation of mobile applications(abbreviated as Apps) on smart phones, users can install many Apps to facilitate their life. Usually, users browse their Appsby swiping touch screen on smart phones, and are likely to spend much time on browsing Apps. In this paper, we design an AppNow widget that is able to predict users' Apps usage. Therefore, users could simply execute Apps from the...
The role of Intrusion Detection System (IDS) has been inevitable in the area of Information and Network Security - specially for building a good network defense infrastructure. Anomaly based intrusion detection technique is one of the building blocks of such a foundation. In this paper, the attempt has been made to apply hybrid learning approach by combining k-Medoids based clustering technique followed...
It is necessary for a researcher to know historical transition in researchers and research topics. Although Web search engines can be used for obtaining such information, collecting the information across a long time period is difficult and laborious. Thus, we proposed a method for automatically extracting historical transition in researchers and research topics by using co-occurrence information...
Very Fast Decision Tree (VFDT) is an exemplar of classification techniques in data stream mining where models are built by incremental learning from continuously arriving data instead of batches. Many variations and modifications were made upon VFDT since it was first introduced in year 2000. Novel contributions were mainly made in two aspects of VFDT, tree induction process and prediction process,...
The aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were used. We started from the grammar structure of basic noun phrase structure model, established a set of rules using the part-of-speech tag and the...
Quantification is the name given to a novel machine learning task which deals with correctly estimating the number of elements of one class in a set of examples. The output of a quantifier is a real value, since training instances are the same as a classification problem, a natural approach is to train a classifier and to derive a quantifier from it. Some previous works have shown that just classifying...
Data uncertainty is common in real-world applications. Various reasons lead to data uncertainty, including imprecise measurements, network latency, outdated sources and sampling errors. These kinds of uncertainties have to be handled cautiously, or else the data mining results could be unreliable or wrong. In this demo, we will show uRule, a new rule-based classification and prediction system for...
We propose an automatic method of extracting bibliographies for academic articles scanned with OCR markup. The method uses conditional random fields (CRF) for labeling serially OCR-ed text lines on an article's title page as appropriate names for bibliographic elements. Although we achieved excellent extraction accuracies for some Japanese academic journals, we needed a substantial amount of training...
There is an important issue that text summarization has to embody personal information need and provide indicative message to user. In this paper, a method of acquiring relevant documents based on user-feedback information and transductive inference SVM machine learning is presented. This method can well avoid the subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection...
In order to improve the predictive accuracy of inductive learning, a heavy analysis about the demerit of C4.5 in dealing with numeric attribute is given. By the method of estimating the probability distribution of the training samples, a new and simple method of dealing with numeric attribute is proposed in this paper. Experimental results of UCI data sets show that the proposed method has an excellent...
This work proposes a hybrid model for text document classification for information retrieval using Naive Bayes and Rough set theory. Rough set theory is used for feature reduction and Naive Bayes theorem is used for classification of documents into the predefined categories by means of the probabilistic values. The deployment of the proposed model is planned through an enhanced method of the utilization...
With increasing Internet popularity, network security has become a serious problem recently. Therefore, a variety of algorithms have been devoted to this challenge. Genetic Network Programming is a newly developed evolutionary algorithm with directed graph gene structures, which has been applied to data mining for intrusion detection systems and has shown that it provides good performances in intrusion...
There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a low-cost robust system architecture for emerging killer probabilistic applications such...
This paper presents a probabilistic cutting plane technique for solving a robust feasibility problem which is to find a solution satisfying a parameter-dependent convex constraint for all possible parameter values. The proposed algorithm employs random samples of the parameter and maximum volume ellipsoid centers of candidates of the solution set. It is shown that the numbers of updates and random...
Firstly, by preprocessing classification rule, we account distinct outlier attributes subspace of the rules about classification rules attributes, then it uses attribute weight vector to calculate weighted distance; secondly, it analyzes subspace outlier influence factor of weighted neighborhood area; finally, we creates frequent matching Sub-Set by comparing with subspace outlier influence factor...
Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA microarray data used in gene selection is usually "wide". With more than ten thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Moreover, even for genes that are biologically relevant, biologists...
Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.