Search results

Items from 1 to 20 out of 142 results

chapter

Geographic focus detection using multiple location taggers

Philipp Berger, Patrick Hennig, Dustin Glaeser, Hauke Klement, more

2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) > 403 - 408

2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

Being able to identify locations associated to a Web resource is essential for providing location-based Web applications. However, geographical information in Web documents is rarely supplied in a machine-readable way and therefore not easily discoverable. As a consequence, it is necessary to extract geographical keywords from Web documents and to associate locations with them. This method is called...

chapter

Sentiment Analysis for e-Services

Fernando Santos Sanchez, Andres Mendez Vazquez

2014 IIAI 3rd International Conference on Advanced Applied Informatics > 42 - 47

2014 IIAI 3rd International Conference on Advanced Applied Informatics (IIAIAAI)

New e-services come on-line each year at an exponential rate. Most of them have the need to analyze and interpret enormous quantities of data. However, many of them do not take into account the emotions and sentiments in the Web page for their analysis. Thus, in this work, we proposed a novel system to obtain data of interest from a Web search engine by analyzing the emotional and sentimental content...

chapter

Logical structure extraction from software requirements documents

Rehan Rauf, Michal Antkiewicz, Krzysztof Czarnecki

2011 IEEE 19th International Requirements Engineering Conference > 101 - 110

2011 IEEE 19th International Requirements Engineering Conference (RE)

Software requirements documents (SRDs) are often authored in general-purpose rich-text editors, such as MS Word. SRDs contain instances of logical structures, such as use case, business rule, and functional requirement. Automated recognition and extraction of these instances enables advanced requirements management features, such as automated traceability, template conformance checking, guided editing,...

chapter

A Multi-Agent Based Web Mining Model

Zengmin Geng, Xuefei Li, Xiaodong Sun

2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS) > 1 - 4

2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS)

A multi-agent based Web mining model is designed for the improvement of the efficiency of keywords based search engine. The model divides mining task into several parallel agents which coordinately work together, and the mining efficiency is improved greatly. Evolving from HITS, algorithm named Grabber in the model removes Link Farm pages in the expansion of root set, makes anchor text similarity...

chapter

Adversary phase change detection using S.O.M. and text data

A B Doser, A E Speed, C E Warrender

2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE) > 380 - 383

2011 Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE)

In this work, we developed a self-organizing map (SOM) technique for using web-based text analysis to forecast when a group is undergoing a phase change. By “phase change”, we mean that an organization has fundamentally shifted attitudes or behaviors. For instance, when ice melts into water, the characteristics of the substance change. A formerly peaceful group may suddenly adopt violence, or a violent...

chapter

A multi-block scheme for searching source codes

Sheng-Kuei Hsu, Shi-Jen Lin

2010 International Computer Symposium (ICS2010) > 608 - 613

2010 International Computer Symposium (ICS 2010)

The large amounts of software source code projects available on the Internet or within companies are creating new information retrieval challenges. Present-day source code search engines, such as Google Code Search, tend to treat source code as pure text, as they do with web pages. However, source code files differ from web pages or pure text files in that each file may contain certain blocks expressing...

chapter

Chinese Web Text Outlier Mining Based on Domain Knowledge

Xia Huosong, Fan Zhaoyan, Peng Liuyan

2010 Second WRI Global Congress on Intelligent Systems > 2 > 73 - 77

2010 Second WRI Global Congress on Intelligent Systems (GCIS 2010)

Web text mining is a growing research area in data mining. Interestingly, the existing Web text mining algorithms have concentrated on finding frequent patterns while discarding the less frequent ones that may contain outliers. In addition, the domain knowledge in one industry is partly different from that in the others. Whatever they belong to, web texts are analyzed using the same dictionary. This...

chapter

Text studies towards multi-lingual content mining for web communication

K B Prakash, M A D Rangaswamy, A R Raman

Trendz in Information Sciences&Computing(TISC2010) > 28 - 31

2nd International Conference on Trendz in Information Sciences & Computing (TISC 2010)

Communication through web is becoming increasingly popular thanks to wireless and cellular networks. As this awareness spreads far and wide in different countries, significant complexities arise in terms of language and communication means for extracting information on the web. This is particularly true in India where more than fifteen officially recognized language texts and more variations in local...

chapter

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

Lay-Ki Soon, Sang Ho Lee

2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems > 195 - 202

Sixth International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2010)

Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords, we propose to represent web pages using information extraction patterns that are...

chapter

Annotating educational content by questions created by learners

Maroš Unčík, Mária Bieliková

2010 Fifth International Workshop Semantic Media Adaptation and Personalization > 13 - 18

2010 5th International Workshop on Semantic Media Adaptation and Personalization (SMAP 2010)

Several approaches to educational web-based content enrichment have been devised. Annotations in form of comments and other types of remarks obviously supply these approaches. Annotations allow enriching the educational materials mainly by retaining key information or comments; they can support visual search and also collaboration. In this paper we present a method for an acquisition of new educational...

chapter

Web-based keyword adapted Language Modeling for Keyword Spotting

Wenzhu Shen, Ji Wu, Wei Li

2010 7th International Symposium on Chinese Spoken Language Processing > 251 - 255

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus and the test data. This paper proposes a novel two-step data selection method based...

chapter

Detection of Hazardous Information Based on HTML Elements

Kazushi Ikeda, Tadashi Yanagihara, Kazunori Matsumoto, Yasuhiro Takishima

2010 IEEE RIVF International Conference on Computing&Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) > 1 - 4

2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)

In this paper, we propose high-speed, accurate algorithms for detecting hazardous Web pages. Our algorithms automatically choose strings that appear especially in HTML elements of hazardous Web pages. We use these strings in combination as features of SVMs (support vector machines), and detect hazardous Web pages. Since our algorithms do not rely on the text parts of Web pages, they can detect Web...

chapter

Extracting Topics Information from Conference Web Pages Using Page Segmentation and SVM

Yaw-Huei Chen, Sin-Sian Li, Yu-Ta Chen

2010 International Conference on Technologies and Applications of Artificial Intelligence > 270 - 277

2010 International Conference on Technologies and Applications of Artificial Intelligence (TAAI 2010)

Conference web pages display their topics information in different ways, and conferences in different domains accept papers on different topics. Automatic extraction of topics information from conference web pages is thus a difficult task and has not received much attention from the research community. In this paper, we propose a method for extracting topics information that uses a web page segmentation...

chapter

Web text classification for response generation in spoken decision support dialogue systems

T Misu, K Ohtake, C Hori, H Kashioka, more

2010 4th International Universal Communication Symposium > 131 - 134

2010 4th International Universal Communication Symposium (IUCS 2010)

This paper addresses issues in generating responses by extracting sentences from the Web for spoken decisionmaking dialogue systems. Various decision criteria are usually involved when selecting an alternative from a given set of alternatives. Such a dialogue system is required to explain the alternatives in terms of each decision criterion focusing on why the alternative is recommended. Preparation...

chapter

Crawl Topical Vietnamese Web Pages Using Genetic Algorithm

Nguyen Quoc Nhan, Vu Tuan Son, Huynh Thi Thanh Binh, Tran Duc Khanh

2010 Second International Conference on Knowledge and Systems Engineering > 217 - 223

2010 Second International Conference on Knowledge and Systems Engineering (KSE)

A focused crawler traverses the web selecting out relevant pages according to a predefined topic. While browsing the internet it is difficult to identify relevant pages and predict which links lead to high quality pages. In this paper, we propose a crawler system using genetic algorithm to improve its crawling performance. Apart from estimating the best path to follow, our system also expands its...

chapter

Extracting Parallel Texts from the Web

Le Quang Hung, Le Anh Cuong

2010 Second International Conference on Knowledge and Systems Engineering > 147 - 151

2010 Second International Conference on Knowledge and Systems Engineering (KSE)

Parallel corpus is the valuable resource for some important applications of natural language processing such as statistical machine translation, dictionary construction, cross-language information retrieval. The Web is a huge resource of knowledge, which partly contains bilingual information in various kinds of web pages. It currently attracts many studies on building parallel corpora based on the...

chapter

A Framework to Answer Questions of Opinion Type

Xiangdong Su, Guanglai Gao, Yu Tian

2010 Seventh Web Information Systems and Applications Conference > 166 - 169

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

In this paper, we propose a framework to answer questions of opinion type. The data source is the web pages returned from the search engine. By using Bayes Classifier, the main texts on the pages are classified into three categories at sentence level: positive review, negative review and neutral review. K-means method is used to cluster the sentences of positive review and negative review respectively...

chapter

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Harish Sethu, Alexander Yates

2010 IEEE Second International Conference on Social Computing > 683 - 686

2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010)

A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context...

chapter

Extracting product features and opinions from product reviews using dependency analysis

G Somprasertsri, P Lalitrojwong

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2358 - 2362

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

In web pages, the reviews are written in natural language and are unstructured-free-texts scheme. Online product reviews is considered as a significant informative resource which is useful for both potential customers and product manufacturers. The task of manually scanning through large amounts of review one by one is computational burden and is not practically implemented with respect to businesses...

chapter

Text-Based Web Page Classification with Use of Visual Information

Vladimír Bartík

2010 International Conference on Advances in Social Networks Analysis and Mining > 416 - 420

2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010)

As the number of pages on the web is permanently increasing, there is a need to classify pages into categories to facilitate indexing or searching them. In the method proposed here, we use both textual and visual information to find a suitable representation of web page content. In this paper, several term weights, based on TF or TF-IDF weighting are proposed. Modification is based on visual areas,...

Keywords:
WEB PAGES

Publication date

Set your own date range

Content availability

Available (135)
None (7)

Publication type

book (136)
article (6)

Keywords

DATA MINING (87)
INTERNET (72)
INFORMATION RETRIEVAL (40)
FEATURE EXTRACTION (39)
WEB SITES (31)
HTML (30)
ACCURACY (27)
NATURAL LANGUAGE PROCESSING (26)
SEARCH ENGINES (26)
CLASSIFICATION ALGORITHMS (24)
TRAINING (21)
TEXT CATEGORIZATION (19)
CLASSIFICATION (18)
CLUSTERING ALGORITHMS (17)
ALGORITHM DESIGN AND ANALYSIS (16)
MACHINE LEARNING (16)
HYPERMEDIA MARKUP LANGUAGES (14)
LEARNING (ARTIFICIAL INTELLIGENCE) (14)
WEB PAGE (14)
INFORMATION EXTRACTION (13)
PATTERN CLUSTERING (13)
SUPPORT VECTOR MACHINES (13)
PATTERN CLASSIFICATION (12)
ONTOLOGIES (11)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (11)
DATABASES (10)
DICTIONARIES (10)
INFORMATION FILTERING (10)
NATURAL LANGUAGES (10)
QUERY PROCESSING (10)
SEMANTIC WEB (10)
WEB MINING (10)
WORLD WIDE WEB (10)
PROBABILITY DENSITY FUNCTION (9)
TEXT MINING (9)
VECTOR SPACE MODEL (9)
VISUALIZATION (9)
INDEXING (8)
TEXT CLASSIFICATION (8)
COMPUTERS (7)
SEARCH ENGINE (7)
VOCABULARY (7)
BAYES METHODS (6)
CLUSTERING (6)
COMPLEXITY THEORY (6)
COMPUTATIONAL MODELING (6)
WEB PAGE CLASSIFICATION (6)
ARRAYS (5)
CONTEXT (5)
CORRELATION (5)
ENCYCLOPEDIAS (5)
ENTROPY (5)
EQUATIONS (5)
FEATURE SELECTION (5)
FILTERING (5)
IMAGE RETRIEVAL (5)
INFORMATION SERVICES (5)
KNOWLEDGE ENGINEERING (5)
PEDIATRICS (5)
SEMANTICS (5)
SUPPORT VECTOR MACHINE CLASSIFICATION (5)
WEB SEARCH (5)
ARTIFICIAL NEURAL NETWORKS (4)
CONFERENCES (4)
CONTENT MANAGEMENT (4)
CRAWLERS (4)
DATA MODELS (4)
ELECTRONIC PUBLISHING (4)
GOOGLE (4)
INFORMATION FILTERS (4)
LAYOUT (4)
MATHEMATICAL MODEL (4)
SELF-ORGANISING FEATURE MAPS (4)
SOFTWARE (4)
STATISTICAL ANALYSIS (4)
TEXT EXTRACTION (4)
WORD PROCESSING (4)
ANALYTICAL MODELS (3)
BAYESIAN METHODS (3)
BLOGS (3)
BOOK REVIEWS (3)
BUILDINGS (3)
CLUSTERING METHODS (3)
COMMUNITIES (3)
COMPUTER SCIENCE (3)
DATA ANALYSIS (3)
DISTANCE MEASUREMENT (3)
DOCUMENT HANDLING (3)
EDUCATION (3)
IMAGE COLOR ANALYSIS (3)
INDEXES (3)
KERNEL (3)
KNOWLEDGE ACQUISITION (3)
MANUALS (3)
META DATA (3)
MULTIMEDIA COMMUNICATION (3)
NEURONS (3)
ONTOLOGY (3)
more

INFONA - science communication portal

Search results

Geographic focus detection using multiple location taggers

Sentiment Analysis for e-Services

Logical structure extraction from software requirements documents

A Multi-Agent Based Web Mining Model

Adversary phase change detection using S.O.M. and text data

A multi-block scheme for searching source codes

Chinese Web Text Outlier Mining Based on Domain Knowledge

Text studies towards multi-lingual content mining for web communication

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

Annotating educational content by questions created by learners

Web-based keyword adapted Language Modeling for Keyword Spotting

Detection of Hazardous Information Based on HTML Elements

Extracting Topics Information from Conference Web Pages Using Page Segmentation and SVM

Web text classification for response generation in spoken decision support dialogue systems

Crawl Topical Vietnamese Web Pages Using Genetic Algorithm

Extracting Parallel Texts from the Web

A Framework to Answer Questions of Opinion Type

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Extracting product features and opinions from product reviews using dependency analysis

Text-Based Web Page Classification with Use of Visual Information

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options