Search results

Items from 1 to 7 out of 7 results

chapter

Finding pages on the unarchived Web

Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Thaer Samar, more

IEEE/ACM Joint Conference on Digital Libraries > 331 - 340

2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Web archives preserve the fast changing Web, yet are highly incomplete due to crawling restrictions, crawling depth and frequency, or restrictive selection policies—most of the Web is unarchived and therefore lost to posterity. In this paper, we propose an approach to recover significant parts of the unarchived Web, by reconstructing descriptions of these pages based on links and anchors in the set...

chapter

Language specific crawling based on web pages features

Masomeh Azimzadeh, Alireza Yari, Mohammad Javad Kargar

2010 International Conference on Multimedia Computing and Information Technology (MCIT) > 17 - 20

2010 International Conference on Multimedia Computing and Information Technology (MCIT 2010)

Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented...

chapter

An empirical study on harmonizing classification precision using IE patterns

Lay-Ki Soon, Kyu-Baek Hwang, Sang Ho Lee

The 2nd International Conference on Software Engineering and Data Mining > 251 - 256

2nd International Conference on Software Engineering and Data Mining (SEDM 2010)

Web pages are conventionally represented by the words found within the contents for classification purpose. However, word-based web page representation suffers several limitations such as synonymy and homonymy. Motivated by the limitations of word-based representation, we explore the potential of representing web pages using information extraction patterns, in addition to words that are identified...

chapter

Incrementally Updating Concept Context Graph (CCG) for Focused Web Crawling Based on FCA

Zhaoqiong Gao, Yajun Du, Liangzhong Yi, Qiangqiang Peng, more

2009 Asia-Pacific Conference on Information Processing > 2 > 40 - 43

2009 Asia-Pacific Conference on Information Processing, APCIP

Focused Web crawler collects relevant Web pages of interested topics from the Internet. Most searchers have studied strategy based on an initial model to gather as many relevant Web pages as possible in the focused Web crawling. However, Web information continually change over time, the initial model representing outdated information canpsilat reflect userpsilas interested topics rightly. In this...

chapter

Focused Crawling for Retrieving E-commerce Information Based on Learnable Ontology and Link Prediction

Wei Huang, Liyi Zhang, Jidong Zhang, Mingzhu Zhu

2009 International Symposium on Information Engineering and Electronic Commerce > 574 - 579

2009 International Symposium on Information Engineering and Electronic Commerce (IEEC)

With the rapid growth of the e-commerce, how to discovery the specific information such as about buyer, seller and products etc. adapting for the online business user becomes a focused issue to the information search engine. Focused crawling is proposed to selectively seek out pages that are relevant to a predefined set of topics without downloading all of the Web. We present a novel approach for...

chapter

A collaborative approach to building evaluated web pages datasets

R. Barros, J.A. Rodrigues Nt, H.J.A.C. Filho, F. Ferreira, more

2009 13th International Conference on Computer Supported Cooperative Work in Design > 668 - 673

2009 13th International Conference on Computer Supported Cooperative Work in Design

In order to evaluate information retrieval algorithms it is imperative to use a dataset as a test database. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This paper presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable...

chapter

Adaptive focused crawler based on tunneling and link analysis

Xiaoming Zhang, Zhoujun Li, Chaojian Hu

2009 11th International Conference on Advanced Communication Technology > 3 > 2225 - 2230

2009 11th International Conference on Advanced Communication Technology

At present, using focused crawler becomes a way to seek the needed information. The main characteristic of a focused web crawler is to select and retrieve only relevant web pages in each crawling process. In this paper, we propose a learnable algorithm that combines link analysis with web content in order to retrieve specific web documents, and it can predict the next URL through learning. The algorithm...

Filter options

Data set:
ieee
Keywords:
CONTEXT
INFORMATION RETRIEVAL
CRAWLERS

Publication date

Set your own date range

Keywords

INTERNET (5)
WEB PAGES (5)
DATA MINING (4)
FEATURE EXTRACTION (2)
SEARCH ENGINES (2)
ADAPTIVE FOCUSED WEB CRAWLER (1)
ANCHOR TEXT (1)
BUILDINGS (1)
BUSINESS (1)
CLASSIFICATION ALGORITHMS (1)
COLLABORATION (1)
COLLABORATIVE APPROACH (1)
COMMERCIAL WEB PAGES (1)
COMPUTER SCIENCE (1)
CONCEPT CONTEXT GRAPH (1)
CONTEXT MODELING (1)
COOPERATIVE WORK (1)
CULTURAL DIFFERENCES (1)
DATA QUALITY (1)
DATA QUALITY EVALUATION TECHNIQUE (1)
DATASET BUILDING (1)
DISTANCE MEASUREMENT (1)
DOCUMENT HANDLING (1)
DOMAIN ONTOLOGY (1)
E-COMMERCE (1)
ELECTRONIC COMMERCE (1)
FOCUSED CRAWLING (1)
FOCUSED WEB CRAWLING (1)
FORMAL CONCEPT ANALYSIS (1)
FUZZY SET THEORY (1)
FUZZY THEORY (1)
GRAPH THEORY (1)
GROUPWARE (1)
HYPERLINKS CONNECTION (1)
INCREMENTAL CONCEPT (1)
INCREMENTAL CONCEPT CONTEXT GRAPH (1)
INDEXING (1)
INFORMATION EXTRACTION (1)
INFORMATION RETRIEVAL SYSTEMS (1)
INFORMATION SEARCH ENGINE (1)
INTELLIGENT FOCUSED CRAWLER (1)
IRANIAN WEB DOMAIN (1)
LANGUAGE SPECIFIC CRAWLING (1)
LATTICES (1)
LEARNABLE ALGORITHM (1)
LEARNABLE ONTOLOGY (1)
LEARNING (ARTIFICIAL INTELLIGENCE) (1)
LIBRARIES (1)
LINK ANALYSIS (1)
LINK EVIDENCE (1)
LINK PREDICTION (1)
MACHINE LEARNING (1)
MANUALS (1)
MATERIALS (1)
META DATA (1)
METADATA (1)
ONLINE BUSINESS USER (1)
ONTOLOGIES (1)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (1)
ONTOLOGY (1)
PERSIAN LANGUAGE (1)
PRAGMATICS (1)
PROBABILITY DENSITY FUNCTION (1)
SEMANTIC SIMILARITY (1)
TIME-CONSUMING (1)
TRAINING (1)
TUNNELING (1)
TUNNELING ANALYSIS (1)
UNIFORM RESOURCE LOCATORS (1)
WEB ARCHIVES (1)
WEB ARCHIVING (1)
WEB CLASSIFICATION (1)
WEB CONTENT (1)
WEB CRAWLERS (1)
WEB CRAWLING (1)
WEB DOCUMENT (1)
WEB DOCUMENT METADATA (1)
WEB DOCUMENT RETRIEVAL (1)
WEB DOCUMENTS (1)
WEB MINING (1)
WEB PAGE (1)
WEB PAGES DATASET (1)
WEB PAGES FEATURES (1)
WORD WIDE WEB (1)
more

INFONA - science communication portal

Search results

Finding pages on the unarchived Web

Language specific crawling based on web pages features

An empirical study on harmonizing classification precision using IE patterns

Incrementally Updating Concept Context Graph (CCG) for Focused Web Crawling Based on FCA

Focused Crawling for Retrieving E-commerce Information Based on Learnable Ontology and Link Prediction

A collaborative approach to building evaluated web pages datasets

Adaptive focused crawler based on tunneling and link analysis

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options