The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Focused crawling is a mean for acquiring raw big data materials from the web. This paper proposes a focused crawler for discovering Arabic poetry resources based on the Apache Nutch crawler. The crawler identifies poetry relevant resources using an SVM classifier and a list of Arabic poetry related keywords. The crawler is able to collect relevant webpages with a precision of 94%.
Automatic classification of news articles is a relevant problem due to the large amount of news generated every day, so it is crucial that these news are classified to allow for users to access to information of interest quickly and effectively. On the one hand, traditional classification systems represent documents as bag-of-words (BoW), which are oblivious to two problems of language: synonymy and...
Recent years have brought the burst of volume of shared opinionated texts across the internet. Every day, a tremendous number of comments and reviews towards different aspects of our lives is generated through social networks and other websites. A large portion of these data is written in Arabic which is the fifth most used language on internet [1] and is one of the six official languages of the United...
Nowadays, the number of internet users in Indonesia is increasing rapidly. This condition leads to possibility to use and analyze the data gathered from the internet users to show the big picture of the specific condition in certain region. Currently, there are some research on analyzing public mood, such as happy, sad, anger, fear, and neutral. The analysis itself consists of mood classification...
Network traffic in the world wide is calculated to rise every year twice the times. To keep pace and profit from this increased amount of flows efficiently. And offer new services. Some efficient techniques needed. Day by day new applications are invented and they have heterogeneous nature in network environment and communication between these new devices also a critical part. improving the network...
Electronic mail is one of today's most important ways to communicate and transfer information. Because of fast delivery and easy to access, it is used almost in every aspect of communication in work and life. However, the increase in email users has resulted in a dramatic increase in spam emails during the past few years. In this paper, we propose an email-filtering approach that is based on supervised...
With the rapid development and popularity of the Internet technology, more and more people like to share their feelings and experiences on the Internet. The increasing rich in network resources, such as personal blogs, online reviews, etc., provide us with new opportunities and challenges. Opinion mining is talking about how to use information technology to mining, and understand the views of others...
Changes in the network topology such as large-scale power outages or Internet worm attacks are events that may induce routing information updates. Border Gateway Protocol (BGP) is by Autonomous Systems (ASes) to address these changes. Network reachability information, contained in BGP update messages, is stored in the Routing Information Base (RIB). Recent BGP anomaly detection systems employ machine...
Cyberbullying has become intensive field of research, due to its major impact on society. Most researchers analyze causes and consequences of cyberbullying, however, only few try to improve software to reduce or stop cyberbullying, and make Internet a safer place. In this article, current review of efforts in cyberbullying detection using web content mining techniques is presented.
The researchers have started looking for Internet traffic recognition techniques that are independent of ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. Newer approaches classify traffic by recognizing statistical patterns in externally observable attributes of the traffic (such as typical packet lengths and inter-arrival times). The main goal is to cluster or...
The purpose of the present work is creating an intelligent system to retrieve desired documents in Marathi language. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history. This paper presents the automatic categorization of Marathi documents and the literature survey of the related work done...
Research of sentence orientation is aim to obtain the useful orientation information, it becomes a research focus in the nature language processing, especially in Micro-blog. Based on the existed How Net semantic similarity, this paper presents a sentence orientation identification method taking advantage of an improved algorithm for calculating Chinese term semantic orientation value. Firstly, this...
The security issue can't be ignored in the applications of Mobile Internet. the present mechanism is very difficult to secure the complex and sensitive data information in the applications of Mobile Internet. Using the method of support vector machine to build the multi-classifier, with the consideration of the cost of classification, different sensitivity is given to different types of sensitive...
As a new form of malicious software, phishing websites appear frequently in recent years, which cause great harm to online financial services and data security. In this paper, we design and implement an intelligent model for detecting phishing websites. In this model, we extract 10 different types of features such as title, keyword and link text information to represent the website. Heterogeneous...
This paper studies classification methods, comparing svm and Naïve's Bayes analysis as applied to viral disease medical data mining. The objective of this study is to explore possibility of applying machine learning techniques such as SVM and Naïve Bayes algorithm for classification to predict the susceptibility for complex disease-Dengue. Both of these algorithms were chosen for their simple, amazing...
Effective classification of web pages can improve the quality of information retrieval. The traditional classification algorithms are basically based on the analysis of Web content, but the content of the web page is complicated, filled with a large number of false, erroneous information, has seriously affected the accuracy of the classification of network information. To solve this problem, this...
Depending on questions, various answering methods and answer sources can be used. In this paper, we build a distributed QA system to handle different types of questions and web sources. When a user question is entered, the broker distributes the question over multiple sub-QAs according to question types. The selected sub-QAs find local optimal candidate answers, and then they are collected in to the...
Grooming attack recognition is a complex issue that is difficult to address using simple word matching in order to identify potential hazard for minor users. In this paper, the utilization of document classification to create patterns from real dialogs is proposed. Furthermore, a decision making method that results in generating proper warning signals based on the classification results is introduced...
Knowledge discovery from the Web is a cyclic process. In this paper we focus on the important part of transforming unstructured information from Web pages into structured relations. Relation extraction systems capture information from natural language text on Web pages, called Web text. However, extraction is quite costly and time consuming. Worse, many Web pages may not contain a textual representation...
Community Question Answering (CQA) has become a popular and effective mean for seeking information on the Web. It is now possible and effective to post a question asked in natural language on a popular community Question Answering (QA) portal, and to rely on other users to provide answers. These online collaborative services are attracting users and questions at an explosive rate, while how to correctly...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.