Search results

Items from 1 to 20 out of 21 results

chapter

Using a thesaurus-based approach for the categorisation of web sites

Sameerchand Pudaruth, Youven Ankiah, Keshav Sembhoo

2014 Seventh International Conference on Contemporary Computing (IC3) > 624 - 628

2014 Seventh International Conference on Contemporary Computing (IC3)

With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited...

chapter

Comparing classification methods for link context based focused crawlers

Kamil Caliskan, Rifat Ozcan

2013 International Conference on Electronics, Computer and Computation (ICECCO) > 143 - 146

2013 International Conference on Electronics, Computer and Computation (ICECCO)

Focused crawlers aim to fetch pages only related to a specific subject area from millions of web pages on the Internet. The essential task in a focused crawler is to predict whether a page is related to the target subject area or not without actually fetching the page content itself. Link context based focused crawlers focus on the surrounding text around each link to classify the page pointed by...

chapter

Ontological based webpage classification

Wui Kheun Ong, Jer Lang Hong, Fariza Fauzi, Ee Xion Tan

2012 International Conference on Information Retrieval & Knowledge Management > 224 - 228

2012 International Conference on Information Retrieval & Knowledge Management (CAMP)

Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following problems 1) As they use brute force matching for the entire document, they tend to be slow...

chapter

A genetic algorithm based optimal feature selection for Web page classification

Selma Ayse Ozel

2011 International Symposium on Innovations in Intelligent Systems and Applications > 282 - 286

2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA)

In this study we propose a genetic algorithm to select best features for Web page classification problem to improve accuracy and run time performance of the classifiers. The increase in the amount of information on the Web has caused the need for accurate automated classifiers for Web pages to maintain Web directories and to increase search engines' performance. To determine whether a Web page belongs...

chapter

Web spam detection based on discriminative content and link features

M Mahmoudi, A Yari, S Khadivi

2010 5th International Symposium on Telecommunications > 542 - 546

2010 5th International Symposium on Telecommunications (IST)

The problem of spam detection is a crucial task in the web information retrieval systems. The dynamic nature of information resources as well as the continuous changes in the information demands of the users makes the task of web spam detection a challenging topic. So far many different methods from researchers with different backgrounds have been proposed to tackle with spam web pages problem. In...

chapter

Crawling Result Pages for Data Extraction Based on URL Classification

Tiezheng Nie, Zhenhua Wang, Yue Kou, Rui Zhang

2010 Seventh Web Information Systems and Applications Conference > 79 - 84

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

In Web database integration, crawling data pages is important for data extraction. The fact that data are contained by multiple result pages increases the difficulty of accessing data for integration. Thus, it is necessary to accurately and automatically crawl query result pages from Web database. To address this problem, we propose a novel approach based on URL classification to effectively identify...

chapter

Improvement of Feature Extraction in Web Page Classification

Jiao Lijuan, Feng Liping

2010 2nd International Conference on E-business and Information System Security > 1 - 3

2010 2nd International Conference on E-business and Information System Security (EBISS 2010)

Mutual information formula is improved by using the hyperlink factor in this paper. Introduction of hyperlink elements of web pages can improve the classification accuracy in feature selection method based on mutual information and correlation by experiment, especially for those of strong. So the improvement is effective in web page classification.

chapter

Web page categorization based on Maximum Entropy Model

Jiao Lijuan, Feng Liping

2010 2nd IEEE International Conference on Information Management and Engineering > 551 - 553

2010 2nd IEEE International Conference on Information Management and Engineering (ICIME 2010)

Web page categorization is becoming a pivotal technology in processing and organizing a mass of documents and data. The feature is selected to improve text-processing technology thinking of factor hyperlink in Maximum Entropy Model. Experiment finds that the method is more effective. It not only can get the most consistent distribution, but ensure the accuracy and universality in sorting webpage classification...

chapter

Comparison of Attribute Selection Methods for Web Texts Categorization

Rizauddin Saian, Ku Ruhana Ku-Mahamud

2010 Second International Conference on Computer and Network Technology > 115 - 118

2010 Second International Conference on Computer and Network Technology (ICCNT 2010)

This paper presents a study on the performance of attribute selection methods to be used with Ant-Miner algorithm for web text categorization. The new generated data set by each attribute selection method was classified with Ant-Miner to see the performance in terms of predictive accuracy and the number of rules generated. The results of classification were also compared to C4.5 algorithm.

chapter

Using Search Engine for Classification: Does It Still Work?

S. Govaerts, N. Corthaut, E. Duval

2009 11th IEEE International Symposium on Multimedia > 483 - 488

2009 11th IEEE International Symposium on Multimedia (ISM 2009)

Genre classification is a key aspect of music descriptions. In 2006, Schedl et al. presented a method for genre classification through web-based co-occurrence analysis. We evaluate whether this method is still valid, given the evolution of the web search technologies. We identify some issues with page count as the main parameter for the analysis in relation with the used genre taxonomies, choice of...

chapter

True Positive Cost Curve: A Cost-Based Evaluation Method for High-Interaction Client Honeypots

C. Seifert, P. Komisarczuk, I. Welch

2009 Third International Conference on Emerging Security Information, Systems and Technologies > 63 - 69

2009 Third International Conference on Emerging Security Information, Systems and Technologies (SECURWARE)

Client honeypots are security devices designed to find servers that attack clients. High-interaction client honeypots (HICHPs) classify potentially malicious Web pages by driving a dedicated vulnerable Web browser to retrieve and classify these pages. Considering the size of the Internet, the ability to identify many malicious Web pages is a crucial task. HICHPs, however, present challenges: They...

chapter

PCI: Plants Classification & Identification Classification of Web Pages for Constructing Plants Web Directory

M. Khalilian, H. Abolhassani, A. Alijamaat, F.Z. Boroujeni

2009 Sixth International Conference on Information Technology: New Generations > 1373 - 1377

2009 Sixth International Conference on Information Technology: New Generations (ITNG 2009)

Despite the growth of the Web in recent years, some portion of the Web remains largely underdeveloped, as shown in lack of high quality contents. An example is the botany specific Web directory, in which lack of well-structured Web directories have limited user's ability to brows the necessary information. In this research we propose an improved framework for constructing a specific Web directory...

chapter

News Contents Recommendation Model Based on Feedback of Web Usage

Ping Ni, Jianxin Liao, Xiaomin Zhu, Keyan Ren

2009 WRI World Congress on Computer Science and Information Engineering > 4 > 431 - 435

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

In this paper, reclassification for the current classification through K-means would be implemented based on the feedback of Web usage mining in order to improve the accuracy of news recommendation and convergence of classification. It could extract most relative keywords and eliminate the disturbance of multi-vocal word in one category based on feedback of Web usage. The reclassification of news...

chapter

Folksonomy for the Blogosphere: Blog Identification and Classification

Rujiang Bai, Xiaoyue Wang, Junhua Liao

2009 WRI World Congress on Computer Science and Information Engineering > 3 > 631 - 635

2009 WRI World Congress on Computer Science and Information Engineering, CSIE

Traditional automatic classifiers often conduct misclassifications. Folksonomy, a new manual classification scheme based on tagging efforts of users with freely chosen keywords can effective resolve this problem. Even though the scalability of folksonomy is much higher than the other manual classification schemes, the method cannot deal with tremendous number of items such as whole Weblog articles...

chapter

Association based classification for relational data and its use in web mining

V. Bartik

2009 IEEE Symposium on Computational Intelligence and Data Mining > 252 - 258

2009 IEEE Symposium on Computational Intelligence and Data Mining

Classification based on mining association rules is a method with good accuracy and human readable classification model. The aim of this paper is to propose modification of the basic association based classification method, which can be used for the data extracted from Web pages. In this paper, the modifications of the method and necessary discretization of numeric attributes will be described. Next,...

chapter

Leveraging Web 2.0 Sources for Web Content Classification

S. Banerjee, M. Scholz

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 1 > 300 - 306

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

This paper addresses practical aspects of Web page classification not captured by the classical text mining framework. Classifiers are supposed to perform well on a broad variety of pages. We argue that constructing training corpora is a bottleneck for building such classifiers, and that care has to be taken if the goal is to generalize to previously unseen kinds of pages on the Web. We study techniques...

chapter

An Analysis of Visual and Presentation Factors Influencing the Design of E-commerce Web Sites

B. Soiraya, A. Mingkhwan, C. Haruechaiyasak

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 3 > 525 - 528

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Two important factors which indirectly influence the Internet shoppers to make some online purchases are the visual layout and the presentation of web page. In this paper, we propose an approach of web page layout analysis in order to assess the design of e-commerce Web sites. Firstly, our proposed method segments each web page into five different blocks: top, left, center, right and bottom. We study...

chapter

A Classifier-CMAC Neural Network Model for Web Mining

S. Dehghan, A.M. Rahmani

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 1 > 427 - 431

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

The rapid growth of Web has made it a huge source of information which will make the availability of data easier and more efficient if its content is well organized. Automatic classification of Web pages is one of the major methods in the Web content mining (WCM) which can be of great value in the development and maintenance of Web directories. Based on the analysis done, CMAC neural network showed...

chapter

Study on Semantic Representation of Web Information Based on Repeating Patterns

Kening Gao, Bin Zhang, Yin Zhang, Hongru Wei, more

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 482 - 486

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

The method that using repeating information, appeared in Web pages to represent the semantic meaning can be used to improve the correct rate of Web pages classification. This paper analyses and improves the traditional repeating patterns representation methods, and further proposes a new semantic representation of Web information based on repeating patterns. First, the repeating patterns are formal...

chapter

A Comparison Study: Web Pages Categorization with Bayesian Classifiers

Zengmei Fu, Chuanliang Chen, Yunchao Gong, Rongfang Bie

2008 10th IEEE International Conference on High Performance Computing and Communications > 789 - 794

2008 10th IEEE International Conference on High Performance Computing and Communications (HPCC)

In the recent few years, web mining has become a hotspot of data mining with the development of Internet. Web pages classification is one of the essential techniques for web mining since classifying web pages of an interesting class is often the first step of mining the web. The high dimensional text vocabulary space is one of the main challenges of web pages. In this paper, we study the capabilities...

Data set:
ieee
Keywords:
ACCURACY
CLASSIFICATION
WEB PAGES

Publication date

Set your own date range

Keywords

CLASSIFICATION ALGORITHMS (13)
INTERNET (12)
DATA MINING (11)
WEB SITES (7)
FEATURE EXTRACTION (5)
WEB MINING (5)
WEB PAGE CLASSIFICATION (5)
FEATURE SELECTION (4)
INFORMATION RETRIEVAL (4)
SEARCH ENGINES (4)
SUPPORT VECTOR MACHINES (4)
HTML (3)
TEXT ANALYSIS (3)
TEXT CATEGORIZATION (3)
TRAINING (3)
VISUALIZATION (3)
WEB PAGE CATEGORIZATION (3)
ARTIFICIAL NEURAL NETWORKS (2)
ASSOCIATION RULES (2)
CLUSTERING ALGORITHMS (2)
CRAWLERS (2)
DATA EXTRACTION (2)
DATABASES (2)
DISTANCE MEASUREMENT (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
MACHINE LEARNING (2)
MATHEMATICAL MODEL (2)
ONLINE FRONT-ENDS (2)
ONTOLOGY (2)
PATTERN CLASSIFICATION (2)
SEARCH ENGINE (2)
SECURITY OF DATA (2)
WEB DIRECTORY (2)
AGGREGATING ONE-DEPENDENCE ESTIMATORS (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ANT-MINER ALGORITHM (1)
APPROXIMATE MATCHING (1)
ASSOCIATION BASED CLASSIFICATION (1)
ASSOCIATION RULES MINING (1)
ATTRIBUTE SELECTION (1)
ATTRIBUTE SELECTION METHODS (1)
BAYES METHODS (1)
BAYESIAN CLASSIFIER (1)
BAYESIAN CLASSIFIERS (1)
BAYESIAN METHODS (1)
BIOLOGICAL CELLS (1)
BLOG CLASSIFICATION (1)
BLOG IDENTIFICATION (1)
BLOGOSPHERE (1)
BOTANY (1)
C4.5 ALGORITHM (1)
CATEGORY MAP (1)
CEREBELLAR MODEL ARITHMETIC COMPUTER (1)
CLASSIFICATION ALGORITHM (1)
CLASSIFICATION RULES (1)
CLASSIFIER-CMAC NEURAL NETWORK MODEL (1)
CLIENT HONEYPOT (1)
CLIENT-SERVER SYSTEMS (1)
CMAC NEURAL NETWORK (1)
COMPONENT (1)
COMPUTATIONAL MODELING (1)
CONTENT BASED WEB PAGE CLASSIFICATION (1)
CONTEXT (1)
CONTROLLED VOCABULARY (1)
CORPUS CONSTRUCTION (1)
CORRELATIVE MATRIX (1)
COST-BASED EVALUATION METHOD (1)
CRAWLING DATA PAGES (1)
CRAWLING RESULT PAGES (1)
DECAY CONCEPT (1)
DECISION TREES (1)
DEDICATED VULNERABLE WEB BROWSER (1)
DISCRIMINATIVE CONTENT (1)
DOCUMENT HANDLING (1)
DOCUMENT SIMILARITY (1)
DYNAMIC THRESHOLD (1)
E-COMMERCE WEB SITES (1)
EDUCATION (1)
ELECTRONIC COMMERCE (1)
ELECTRONIC PUBLISHING (1)
ENCYCLOPEDIAS (1)
ENTROPY (1)
EQUATIONS (1)
EVALUATION (1)
FEATURE (1)
FEATURE SELECTION TECHNIQUE (1)
FOCUSED CRAWLER (1)
FOCUSED CRAWLING (1)
FOLISONOMY-AND-SUPPORT VECTOR MACHINE CLASSIFIER (1)
FOLKSONOMY (1)
FSVMC (1)
GAIN (1)
GENETIC ALGORITHMS (1)
GENETICS (1)
GOVERNMENT (1)
HICHP (1)
HIDDEN NAIVE BAYES (1)
more

INFONA - science communication portal

Search results

Using a thesaurus-based approach for the categorisation of web sites

Comparing classification methods for link context based focused crawlers

Ontological based webpage classification

A genetic algorithm based optimal feature selection for Web page classification

Web spam detection based on discriminative content and link features

Crawling Result Pages for Data Extraction Based on URL Classification

Improvement of Feature Extraction in Web Page Classification

Web page categorization based on Maximum Entropy Model

Comparison of Attribute Selection Methods for Web Texts Categorization

Using Search Engine for Classification: Does It Still Work?

True Positive Cost Curve: A Cost-Based Evaluation Method for High-Interaction Client Honeypots

PCI: Plants Classification & Identification Classification of Web Pages for Constructing Plants Web Directory

News Contents Recommendation Model Based on Feedback of Web Usage

Folksonomy for the Blogosphere: Blog Identification and Classification

Association based classification for relational data and its use in web mining

Leveraging Web 2.0 Sources for Web Content Classification

An Analysis of Visual and Presentation Factors Influencing the Design of E-commerce Web Sites

A Classifier-CMAC Neural Network Model for Web Mining

Study on Semantic Representation of Web Information Based on Repeating Patterns

A Comparison Study: Web Pages Categorization with Bayesian Classifiers

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options