Search results

Items from 1 to 20 out of 174 results

chapter

Independence regularized multi-label ensemble

Ziwei Yu, Changqing Zhang, Qinghua Hu, Pengfei Zhu

2017 IEEE International Conference on Multimedia and Expo (ICME) > 853 - 858

2017 IEEE International Conference on Multimedia and Expo (ICME)

In this paper, we focus on promoting multi-label learning task with ensemble learning. Compared to traditional single algorithm methods, it has been recognized that ensemble methods could achieve much better performance than each constituent learned model, especially under the conditional independence of different classifiers. Existing multi-label ensemble algorithms mainly focus on creating diverse...

chapter

Multi-Task Linear Dependency Modeling for drug-related webpages classification

Ruiguang Hu, Mengxi Hao, Songzhi Jin, Hao Wang, more

2017 20th International Conference on Information Fusion (Fusion) > 1 - 7

2017 20th International Conference on Information Fusion (Fusion)

In this paper, Multi-Task Linear Dependency Modeling is proposed to distinguish drug-related webpages that contain lots of images and text. Linear Dependency Modeling exploits semantic relations between images features and text features, and Multi-Task Learning takes advantage of metadata of webpages. Meaningful information of webpages can be made use of fully to improve classification accuracy. Experimental...

chapter

Clustering search engine suggests by integrating a topic model and word embeddings

Tian Nie, Yi Ding, Chen Zhao, Youchao Lin, more

2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 581 - 586

2017 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

The background of this paper is the issue of how to overview the knowledge of a given query keyword. Especially, we focus on concerns of those who search for Web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, we collect up to around 1,000 suggests, while many of them are redundant. We cluster...

chapter

Library Resource Vertical Search Engine Based on Ontology

Yingli Yao

2017 International Conference on Smart Grid and Electrical Automation (ICSGEA) > 672 - 675

2017 International Conference on Smart Grid and Electrical Automation (ICSGEA)

Different from general-purpose search engines, vertical search engine only needs to collect and index only a specific knowledge domain, and then provides more professional search services for users. In this paper, we propose a novel library resource vertical search engine based on ontology technology. In the vertical search engine, the information that crawler collects from Internet should be further...

chapter

Towards Web Spam Filtering Using a Classifier Based on the Minimum Description Length Principle

Renato M. Silva, Tiago A. Almeida, Akebo Yamakami

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 470 - 475

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

The steady growth and popularization of the Web has led spammers to develop techniques to circumvent search engines aiming good visibility to their web pages in search results. They are responsible for serious problems such as dissatisfaction, irritation, exposure to unpleasant or malicious content, and financial loss. Despite different machine learning approaches have been used to detect web spam,...

chapter

Extracting Addresses from News Reports Using Conditional Random Fields

Donald E. Brown, Xiaoqian Liu

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 791 - 795

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Spatial analysis in many fields requires effective address extraction from text reports. This problem is of particular importance in social science where news reports contain information about socially relevant incidents. Previous address extraction work focuses on web pages where addresses are separated from other text, however news reports contain addresses embedded in text. Hence, the need for...

chapter

Towards effective web page classification

Min Gu, Feng Zhu, Qing Guo, Yanhui Gu, more

2016 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC) > 1 - 2

2016 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC)

In order to manage and organize information on the web, we propose a novel web page classification strategy integrating topic model and SVM. We use topic model to harness the implicit information on web pages for feature extraction. Accuracy of the strategy is 84.15%, 2.23% superior to the traditional classification strategy based on CHI.

chapter

News web page classification using url content and structure attributes

Chandrakala Arya, Sanjay K. Dwivedi

2016 2nd International Conference on Next Generation Computing Technologies (NGCT) > 317 - 322

2016 2nd International Conference on Next Generation Computing Technologies (NGCT)

The incentive for this work originates from the need of retrieving useful web news pages from the Indian news websites corpus. News web pages contrast from other web pages; it is mainly vital to recognize web news accurately for precise classification. We will likely locate a simple yet efficient technique to mine news articles from web corpus. To accomplish this task, the automatic recognition method...

chapter

Classification via Hidden Markov Trees for a Vision-Based Approach to Conveying Webpages to Users with Assistive Needs

Michael Cormier, Richard Mann, Robin Cohen, Kary Moffatt

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI) > 695 - 700

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)

In this paper we present an overview of our proposed algorithms for classifying regions of web pages based on content and visual properties. We show how hidden Markov trees may be effective for the classification and how this may end up offering improved experiences to users who are trying to view webpages.

chapter

Detecting user actions from HTTP traces: Toward an automatic approach

Luca Vassio, Idilio Drago, Marco Mellia

2016 International Wireless Communications and Mobile Computing Conference (IWCMC) > 50 - 55

2016 International Wireless Communications and Mobile Computing Conference (IWCMC)

Detecting explicit user actions, i.e., requests for web pages such as hyper-link clicks, from passive traces is fundamental for many applications, such as network forensics or content popularity estimation. Every URL explicitly visited by a user usually triggers further automatic URL requests to obtain all objects that compose the web page. HTTP traces provide a summary of all URLs requested by users,...

chapter

N-gram approach for a URL similarity measure

Neetu Singh, Narendra S. Chaudhari

2016 1st India International Conference on Information Processing (IICIP) > 1 - 6

2016 1st India International Conference on Information Processing (IICIP)

This work addresses the problem of URL topic classification by making use of the text of Uniform Resource Locators (URLs). We have introduced a method for classifying the web pages into topics by extending the Jaccard distance measure and using the n-gram approach. We have also compared our method with the best performing known distance measures for Boolean data in the literature i.e. Jaccard, Dice...

chapter

Teaching-to-Learn and Learning-to-Teach for Few Labeled Classification

Yanchao Li, Yongli Wang, Xiaohui Jiang, Zhenjiang Dong

2016 International Conference on Advanced Cloud and Big Data (CBD) > 271 - 276

2016 International Conference on Advanced Cloud and Big Data (CBD)

Co-training is a semi-supervised learning paradigm that trains some classifiers and let them label some unlabelled instances for each other during the learning process. One challenge of the co-training style algorithm is to train an initial weakly useful predictor when the number of labeled instances is very limited. In this paper, we use Teaching-to-learn and Learning-to-teach strategy, which each...

chapter

Application of dynamic neural network for prediction of advertisement clicks

Vita Jaseviciute, Darius Plonis, Arturas Serackis

2016 Open Conference of Electrical, Electronic and Information Sciences (eStream) > 1 - 4

2016 Open Conference of Electrical, Electronic and Information Sciences (eStream)

Paper focuses on the optimization of the advertising and other additional costs for the small business ecommerce web sites. The aim of this paper was to propose a dynamic neural network based algorithm to predict number of clicks on a particular advertising link in three web pages of three different small companies working in the same business segment. The dynamic neural network based algorithm was...

chapter

TweeVist: A geo-tweet visualization system for web based on spatio-temporal events

Yuanyuan Wang, Yukiko Kawai, Kazutoshi Sumiya, Yoshiharu Ishikawa

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) > 1 - 6

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

This paper presents TweeVist, a geo-tweet visualization system to support users grasp event happens over time and space from tweets while they browse any web pages based on spatio-temporal analysis. TweeVist presents a tag cloud of tweets in different time periods are associated with web pages based on detected events. In order to detect events, the system extracts normal events (e.g., crowded restaurants,...

chapter

A Machine Learning Based Web Spam Filtering Approach

Santosh Kumar, Xiaoying Gao, Ian Welch, Masood Mansoori

2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA) > 973 - 980

2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)

Web spam has the effect of polluting search engine results and decreasing the usefulness of search engines.Web spam can be classified according to the methods used to raise the web page's ranking by subverting web search engine's algorithms used to rank search results. The main types are: content spam, link spam and cloaking spam. There has been little or no work on automatically classifying web spam...

chapter

Content extraction issues in online web education

Kolla Bhanu Prakash, K Sravan Kumar, S Uma Maheswara Rao

2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 680 - 685

2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

Internet and search engines are increasing its prominence in modern day life. Search engines like Google, Bing and Yahoo are perhaps the largest source of information that anyone can access at anytime in the present day life. People have different interests while using the Internet. Advanced users could be interested in automatically extracting information from pages for later processing and web mining,...

chapter

Algorithm for updating n-grams word dictionary for web classification

Taufik Fuadi Abidin, Ridha Ferdhiana

2016 International Conference on Informatics and Computing (ICIC) > 432 - 436

2016 International Conference on Informatics and Computing (ICIC)

In this paper, we examine an algorithm to update n-grams word dictionary (thesaurus) and evaluate its effectiveness in binary classification problem. The thesaurus is used as a reference to generate the numerical feature attributes of web pages. Generally, the n-grams word dictionary is built once using a set of training data and its content is never updated. Hence, the content is static and its coverage...

chapter

Implicit links based kernel to enrich Support Vector Machine for web page classification

Abdelbadie Belmouhcine, Mohammed Benkhalifa

2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA) > 1 - 4

2015 10th International Conference on Intelligent Systems: Theories and Applications (SITA)

Support Vector Machine (SVM) is a powerful classifier used widely in textual and web classification. It tries to find an hyperplane that separates positive and negative data, maximizes the margin. SVM is a classifier that is based on a kernel whose choice is very critical. We propose in this paper an implicit links based Gaussian kernel that uses an implicit links based distance. This kernel helps...

chapter

Classifying web hierarchically using multi label tree classifier

Daya Gupta, Harsh Tripathi, Mayukh Maitra

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

Classification and extraction of web finds its applications in semantic web, searching and information extraction. The first part of the paper deals with the problem of classifying web pages, according to their content. Further, the methodology to classify web pages hierarchically in order to achieve topic-wise modeling of websites using multi label tree classifier, a variant of classification where...

chapter

Proposed Document Frequency technique for minimizing dataset in Web crawler

Amany M. Sarhan, Ghada M. Hamissa, Heba E. Elbehiry

2015 Tenth International Conference on Computer Engineering & Systems (ICCES) > 3 - 7

2015 Tenth International Conference on Computer Engineering & Systems (ICCES)

The explosive growth of webpage number on the Web has brought up some problems in the search process. One of these problems is that the general purpose search engines often return too many irrelevant results when users are searching for specific information on a given topic. Another problem is the massive increase in the number of pages to be indexed by Web search systems. In this research, two steps...

Keywords:
TRAINING

Publication date

Set your own date range

Content availability

Available (171)
None (3)

Publication type

book (169)
article (5)

Keywords

INTERNET (63)
DATA MINING (57)
CLASSIFICATION ALGORITHMS (52)
FEATURE EXTRACTION (45)
SUPPORT VECTOR MACHINES (41)
WEB SITES (32)
SEARCH ENGINES (31)
LEARNING (ARTIFICIAL INTELLIGENCE) (25)
MACHINE LEARNING (24)
ACCURACY (23)
HTML (22)
INFORMATION RETRIEVAL (21)
TEXT ANALYSIS (21)
PATTERN CLASSIFICATION (20)
CLASSIFICATION (19)
CRAWLERS (16)
WEB PAGE CLASSIFICATION (16)
WORLD WIDE WEB (14)
TEXT CATEGORIZATION (13)
NATURAL LANGUAGE PROCESSING (11)
WEB MINING (11)
ALGORITHM DESIGN AND ANALYSIS (10)
ARTIFICIAL NEURAL NETWORKS (10)
CLUSTERING ALGORITHMS (10)
CONTEXT (10)
TESTING (10)
VISUALIZATION (10)
EDUCATIONAL INSTITUTIONS (9)
SEMANTICS (9)
SUPPORT VECTOR MACHINE CLASSIFICATION (9)
UNIFORM RESOURCE LOCATORS (9)
BAYES METHODS (8)
COMPUTERS (8)
FEATURE SELECTION (8)
NEURONS (8)
SERVERS (8)
CORRELATION (7)
DATABASES (7)
DICTIONARIES (7)
ONTOLOGIES (7)
PREDICTION ALGORITHMS (7)
TRAINING DATA (7)
WEB PAGE (7)
COMPANIES (6)
DATA MODELS (6)
DOCUMENT HANDLING (6)
HIDDEN MARKOV MODELS (6)
HUMANS (6)
INFORMATION EXTRACTION (6)
NEURAL NETS (6)
NEURAL NETWORK (6)
PATTERN CLUSTERING (6)
PREDICTIVE MODELS (6)
SECURITY (6)
SUPPORT VECTOR MACHINE (6)
TEXT CLASSIFICATION (6)
VECTORS (6)
WEB SEARCH (6)
BROWSERS (5)
CLASSIFICATION TREE ANALYSIS (5)
CLASSIFIER (5)
COMPUTER AIDED INSTRUCTION (5)
DISTANCE MEASUREMENT (5)
ERROR ANALYSIS (5)
HISTORY (5)
INFORMATION FILTERING (5)
INFORMATION RETRIEVAL SYSTEMS (5)
LABELING (5)
MATHEMATICAL MODEL (5)
SEMI-SUPERVISED LEARNING (5)
SPEECH (5)
SVM (5)
CO-TRAINING (4)
COMPUTATIONAL MODELING (4)
DECISION TREES (4)
ELECTRONIC MAIL (4)
ELECTRONIC PUBLISHING (4)
ENCYCLOPEDIAS (4)
EQUATIONS (4)
FILTERING (4)
INFORMATION RESOURCES (4)
NOISE (4)
PAGE SEGMENTATION (4)
SEMISUPERVISED LEARNING (4)
STANDARDS (4)
TAXONOMY (4)
VOCABULARY (4)
WEB SITE (4)
ADVERTISING (3)
BACKPROPAGATION (3)
BAYESIAN METHODS (3)
BOOKS (3)
BUSINESS (3)
CLUSTERING (3)
COMPLEXITY THEORY (3)
COMPONENT (3)
CONFERENCES (3)
CONSTRUCTION INDUSTRY (3)
more

INFONA - science communication portal

Search results

Independence regularized multi-label ensemble

Multi-Task Linear Dependency Modeling for drug-related webpages classification

Clustering search engine suggests by integrating a topic model and word embeddings

Library Resource Vertical Search Engine Based on Ontology

Towards Web Spam Filtering Using a Classifier Based on the Minimum Description Length Principle

Extracting Addresses from News Reports Using Conditional Random Fields

Towards effective web page classification

News web page classification using url content and structure attributes

Classification via Hidden Markov Trees for a Vision-Based Approach to Conveying Webpages to Users with Assistive Needs

Detecting user actions from HTTP traces: Toward an automatic approach

N-gram approach for a URL similarity measure

Teaching-to-Learn and Learning-to-Teach for Few Labeled Classification

Application of dynamic neural network for prediction of advertisement clicks

TweeVist: A geo-tweet visualization system for web based on spatio-temporal events

A Machine Learning Based Web Spam Filtering Approach

Content extraction issues in online web education

Algorithm for updating n-grams word dictionary for web classification

Implicit links based kernel to enrich Support Vector Machine for web page classification

Classifying web hierarchically using multi label tree classifier

Proposed Document Frequency technique for minimizing dataset in Web crawler

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options