The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The financial market is very fickle and investors have the difficult task of following and trying to predict the swings of the market so that their strategies result in better financial returns. With the use of Big Data and Bayesian mathematical statistics based on prior knowledge and examples of training to determine the likelihood of a hypothesis, financial news can be tracked continuously and affecting...
Anomalous payloads in network packets are a potential source for intrusion in computer networks. In this paper we come up with an efficient machine learning approach to detect anomalous payloads. The approach uses n-gram preprocessing to extract words included in the payload. Bayesian inference is used to learn normal and anomalous traffic patterns from the words extracted during training. During...
Web is gigantic and being constantly update. Bangla news in web are rapidly grown in the era of information age where each news site has its own different layout and categorization for grouping news. These heterogeneity of layout and categorization can not always satisfy individual user's need. Removing these heterogeneity and classifying the news articles according to user preference is a formidable...
Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks can result in huge loss of data and make resources unavailable for legitimate users. With continuous growth of Internet users and traffic, the importance of Intrusion Detection System (IDS) for detection of DoS/DDoS network attacks has also grown. Different techniques such as data mining and pattern recognition are being used...
Detecting anomalous traffic on the Internet has remained an issue of concern for the community of security researchers over the years. Advances in computing performance, in terms of processing power and storage, have allowed the use of resource-intensive intelligent algorithms, to detect intrusive activities, in a timely manner. Naïve Bayes is a statistical inference learning algorithm with promise...
This paper presents a simulation-based empirical study of the performance profile of random sub sample ensembles with a hybrid mix of base learner composition in high dimensional feature spaces. The performance of hybrid random sub sample ensemble that uses a combination of C4.5, k-nearest neighbor (kNN) and naïve Bayes base learners is assessed through statistical testing in comparison to those...
Web text classification is the process of determine the text types automatically under a given classification, according to the text content. Web text categorization system is the use of machine learning, knowledge engineering and other related fields of knowledge, access to the web on the text, after text preprocessing, Chinese word segmentation and training classifier, using classification algorithm...
Really Simple Syndication(RSS) has been widely used in our daily lives, but RSS doesn't always collect interesting articles, user has to sift through every subscription for articles they like. The ranking of unread RSS articles has the potential power to release user from this heavy burden. Although user preferences can be learned from explicit feedbacks such as rating or tagging, implicit feedback...
In this paper we compare four machine learning techniques for blog comments spam filtering. the machine learning techniques are the Naïve Bayes, K-nearest neighbor, neural networks and the support vector machines. For this comparative study we used a blog comment corpus that has been affected by spam, which is our study case in this work. We classify the comments of this blog comments corpus, which...
This paper mainly focuses on the effect of feature selection method on the performance of Traditional Focused Crawler (TFC) and Accelerated Focused Crawler (AFC). Information retrieval methods like querying a search engine, usage of web catalog and browsing may not satisfy the information needs of all the users. When information requirement is about a specific topic, focused crawlers will complement...
With the rapid development of the Internet services and the fast increasing of intrusion problems, the traditional intrusion detection methods cannot work well with the more and more complicated intrusions. So introducing machine learning into intrusion detection systems to improve the performance has become one of the major concerns in the research of intrusion detection. Intrusion detection systems...
The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here,...
As more and more multimedia data become available on the Web, mining on those data is playing an increasingly important role in Web applications. In this paper, we investigate the interplay between multimedia data mining and text data mining. Specifically, in an approach we called text-aided image classification (TAIC), we address the problem of image classification with very limited amount of labeled...
The problem of inferring geographical information associated to web pages and identifying the geographic scope of their content is gaining increasing attention. However, geographic scope is a concept that can be interpreted in many different ways, ranging from the expected target scope of a specific content to the country where the content originated. The latter, in particular, albeit difficult to...
The paper reports a study on information categorizing based on high efficient feature selection and comprehensive semi-supervised learning algorithm. Feature selections or conversions are performed using maximum mutual information including linear and non-linear feature conversions. Entropy is made use of and extended to find right features commendably with machine learning method. Fuzzy partition...
The spread of spam made a lot of interference to Internet users, and wasted network bandwidth. Anti-spam technology has been developed to the third-generation technology, behavior recognition technology. Many studies have focused on the detection of abnormal behavior in the period of SMTP conversation. In this paper, userpsilas typical sending behavior mathematical model is concerned. To find the...
Dynamical analysis of the current network status is critical to detect large scale intrusions and to ensure the networks to continually function. Collecting and analyzing traffic in real time and reporting the current status in time provide a feasible way. In this paper we used a refined naive Bayes method, naive Bayes kernel estimator (NBKE), to identify flooding attacks and port scans from normal...
The study on content-based spam filtering is one of the important topics in the Internet security research area. And Bayesian classification method has expressed better performance on anti-spam. An improved new method that classifies spam filtering based on Bayesian filtering is proposed in this paper. The experiment results show that the new method has improved spam recall and spam precision.
As the rapid development of the Internet, the occurrence of more and more spam mails becomes harmful to users. Content-based spam filtering technologies become the mainstream anti-spam mail methods so far. Support vector machine (SVM), Bayes, windows and KNN are excellent ones of these technologies and they have advantages and disadvantages respectively. The common shortage of content-based methods...
Ambiguous words refer to words that have different meanings such as apple, window, etc. In text classification they are usually removed by feature reduction methods like information gain. Sometimes there are too many ambiguous words in the corpus that we cannot simply throw them away, especially when classifying documents from the Web. In this paper we look for a method to classify titled documents...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.