Search results

Items from 1 to 20 out of 71 results

chapter

Enhancement tools for Arabic web search

A H Yahya, A Y Salhi

2011 International Conference on Innovations in Information Technology > 71 - 76

2011 International Conference on Innovations in Information Technology (IIT)

The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to...

chapter

Proposal and Evaluation of an Extraction Method for Inaccurate Example Sentences Using a Web Search Engine for Multilingual Parallel Texts

T Fukushima, T Yoshino, A Shigeno

2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications > 538 - 543

2011 25th IEEE International Conference on Advanced Information Networking and Applications Workshops (WAINA 2011)

In this study, we have proposed an extraction method for inaccurate example sentences using a Web search engine for multilingual parallel texts. We developed a multilingual parallel-text sharing system named Tack Pad for multilingual communication in the medical field. However, it should be noted that parallel texts created by people can be inaccurate. Hence, we cannot use these parallel texts in...

chapter

Arabic collocations extraction using Gate

S Zaidi, M Laskri, A Abdelali

2010 International Conference on Machine and Web Intelligence > 473 - 475

International Conference on Machine and Web Intelligence (ICMWI 2010)

Information extraction (IE) from corpora is texts analysis in order to extract structured information such as Named Entities (NE) which may be names of person, organization, address, date, location etc. ... GATE is a software toolkit written in Java from 1995 and widely used worldwide by many communities (scientists, companies, teachers, students) for natural language processing. We have experimented...

chapter

The Semantic Vectors Package: New Algorithms and Public Tools for Distributional Semantics

D Widdows, T Cohen

2010 IEEE Fourth International Conference on Semantic Computing > 9 - 15

2010 IEEE Fourth International Conference on Semantic Computing (ICSC)

Distributional semantics is the branch of natural language processing that attempts to model the meanings of words, phrases and documents from the distribution and usage of words in a corpus of text. In the past three years, research in this area has been accelerated by the availability of the Semantic Vectors package, a stable, fast, scalable, and free software package for creating and exploring...

chapter

A fast associative mining system based on search engine and concept graph for large-scale financial report texts

Kun Qian, Sachio Hirokawa, Kenji Ejima, Xiaoping Du

2010 2nd IEEE International Conference on Information and Financial Engineering > 675 - 679

2010 2nd IEEE International Conference on Information and Financial Engineering (ICIFE 2010)

Association mining is widely used in pattern discovery. For large scale financial textual data analysis, however, association mining is relatively less applied due to low efficiency in text manipulation. This paper presents a fast finance textual mining system, based on search engine and concept graph, for large scale financial textual association mining and visualization. Through the experiments...

chapter

The Development of the Tool for Understanding Contents of a Large Amount of Text by Means of User-Generated Word Network

T Ito, M Suwa

2010 IEEE/ACIS 9th International Conference on Computer and Information Science > 522 - 525

2010 IEEE/ACIS 9th International Conference on Computer and Information Science (ICIS 2010)

We have developed the tool which is specialized to the target that a user is able to construct a word network while evaluating what word to set as a node. A co-occurrence network of words is a complex network with huge number of nodes and links and is not capable for man to interpret as it is. Therefore, we have designed its interface that displays a network as network confines gradually expand in...

chapter

A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs

Elisabeth Lex, Andreas Juffinger, Michael Granitzer

2010 Workshops on Database and Expert Systems Applications > 10 - 14

2010 21st International Conference on Database and Expert Systems Applications

In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs...

chapter

A Framework to Answer Questions of Opinion Type

Xiangdong Su, Guanglai Gao, Yu Tian

2010 Seventh Web Information Systems and Applications Conference > 166 - 169

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

In this paper, we propose a framework to answer questions of opinion type. The data source is the web pages returned from the search engine. By using Bayes Classifier, the main texts on the pages are classified into three categories at sentence level: positive review, negative review and neutral review. K-means method is used to cluster the sentences of positive review and negative review respectively...

chapter

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Harish Sethu, Alexander Yates

2010 IEEE Second International Conference on Social Computing > 683 - 686

2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010)

A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context...

chapter

WISDOM from Light-Weight Information Retrieval

David B Bracewell, Steven Gustafson, Abha Moitra, Gregg Steuben

2010 IEEE Second International Conference on Social Computing > 347 - 354

2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010)

This paper presents a light-weight information retrieval and analysis architecture that addresses the complex task of gathering, combining, and storing documents to enable indepth analysis. The growing interest in mining the Internet for conversation topics, opinions, and influencers has resulted in many free and commercial products. At the heart of such capability are two core technologies: information...

chapter

Extraction of purpose data using surface text patterns

P Kiran Mayee, Rajeev Sangal, Soma Paul

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010) > 1 - 7

2010 International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE 2010)

This paper presents the concept of surface text patterns for extracting purpose data from the web. In order to obtain an optimal set of patterns, we have developed a method for learning purpose patterns automatically. A corpus was downloaded from the Internet using bootstrapping by providing a few hand-crafted examples of each purpose pattern to a generic search engine. This corpus was then tagged...

chapter

Zycox: File analyzer on the web

K Sukhija

2010 International Conference on Electronics and Information Engineering > 1 > V1-548 - V1-551

2010 International Conference on Electronics and Information Engineering (ICEIE 2010)

This paper includes the details on the implementation of the Zycox (Document Analyzer). It analyses the complete document by making permutation & combination of words and sentences which are further searched on the different search engines to find their relevant URL's and thus gives the complete statistical analyses on the document with its originality and percentage of document copied from the...

chapter

The prompting method of IPTV input method for Chinese with full text search technology

Xiaofeng Li, Feifei Sun, Liguo Xie, Mingsong Sun

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2395 - 2398

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

In order to prompt the efficiency of the server based Chinese input method for IPTV, the behaviors of querying for the program's text information are analyzed. Then it's proposed to integrate the full-text search engine of Sphinx with the input method to mine the accurate associating characters or words. To fit the main querying behavior for the program role's names, the program synopses are mined...

chapter

HisTrace: A system for mining on news-related articles instead of web pages

Lian'en Huang, Xiaoming Li

2010 IEEE 2nd Symposium on Web Society > 30 - 37

2010 IEEE 2nd Symposium on Web Society (SWS 2010)

The Web is now playing an important part in people's real-life activities. Scientists of not only computer science but also sociology and economics might be interested in mining on information directly related to real-life events, or news-related information on the Web. In this paper we propose a system to enable mining on news-related articles instead of raw web pages. There are functionally two...

chapter

Identifying licensing of jar archives using a code-search approach

Massimiliano Di Penta, Daniel M German, Giuliano Antoniol

2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) > 151 - 160

2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010)

Free and open source software strongly promotes the reuse of source code. Some open source Java components/libraries are distributed as jar archives only containing the bytecode and some additional information. For whoever wanting to integrate this jar in her own project, it is important to determine the license(s) of the code from which the jar archive was produced, as this affects the way that such...

chapter

OntoWrap- extracting data records from search engine results pages using ontological technique

Jer Lang Hong, Eu-Gene Siew, Simon Egerton

2010 International Conference on Information Retrieval&Knowledge Management (CAMP) > 107 - 112

2010 International Conference on Information Retrieval and Knowledge Management (CAMP 2010)

Current automatic wrappers using DOM tree and visual properties of data records to extract the required information from the search engine results pages generally have limitations such as the inability to check the similarity of tree structures accurately. Our study on the properties of data records shows that these data records located in search engine results pages are not only having similar visual...

chapter

An Experimental Study of Vietnamese Question Answering System

Vu Mai Tran, Vinh Duc Nguyen, Oanh Thi Tran, Uyen Thu Thi Pham, more

2009 International Conference on Asian Language Processing > 152 - 155

2009 International Conference on Asian Language Processing (IALP 2009)

The development of world wide web calls for how to efficiently exploit the information. Mostly, current search engines return a set of related documents which contain keywords. However, users expect the exact and concrete answer for each question. Therefore, it is necessary to build an automatic question answering system (QA). In this paper, we focus on building a QA for Vietnamese. This task especially...

chapter

A Personalized Search Research Based on Vocabulary Semantic Net

Xiao Chen, Tian-hua Liu, Jie Liu, Xue-feng Han

2009 Third International Symposium on Intelligent Information Technology Application > 2 > 53 - 56

2009 Third International Symposium on Intelligent Information Technology Application

Along with the fast developing of network technology, the number of Web page and user of network search become very enormous. In order to solve the problem of inefficiency and low precision in the search that users have different demand and knowledge background, this paper presents a new text model called vocabulary semantic net which can be applied to build personalized search engine and tested with...

chapter

Persian Web Pages Clustering Improvement: Customizing the STC Algorithm

M. Azadnia, S. Rezagholizadeh, A. Yari

2009 Fourth International Conference on Computer Sciences and Convergence Information Technology > 717 - 722

2009 Fourth International Conference on Computer Sciences and Convergence Information Technology

Today the Internet in almost all ethnic groups and cultures is found and the Web pages are developing very quickly in most countries and different languages. Considering the size and incoherent available information in the Internet has made the use of search engines obvious and necessary. Since search engines pay less attention to the linguistics and content features of documents in different languages...

chapter

Applying Collective Intelligence for Search Improvement on Thai Herbal Information

V. Lertnattee, S. Chomya, T. Theeramunkong, V. Sornlertlamvanich

2009 Ninth IEEE International Conference on Computer and Information Technology > 2 > 178 - 183

2009 Ninth IEEE International Conference on Computer and Information Technology. CIT 2009

Knowledge about herbal medicine can be contributed from experts in several cultures. With the conventional techniques, it is hard to find the way which the experts can build a self-sustainable community for exchanging their information. In this paper, the Knowledge Unifying Initiator for Herbal Information (KUIHerb) is used as a platform for building a web community for collecting the intercultural...

Data set:
ieee
Keywords:
DATA MINING
TEXT ANALYSIS
SEARCH ENGINES

Publication date

Set your own date range

Content availability

Available (68)
None (3)

Keywords

INTERNET (38)
INFORMATION RETRIEVAL (31)
FEATURE EXTRACTION (20)
SEARCH ENGINE (17)
NATURAL LANGUAGE PROCESSING (14)
WEB PAGES (14)
ACCURACY (13)
QUERY PROCESSING (13)
ALGORITHM DESIGN AND ANALYSIS (11)
CLUSTERING ALGORITHMS (11)
WEB SITES (11)
PATTERN CLUSTERING (10)
TEXT MINING (10)
WEB SEARCH (9)
WORLD WIDE WEB (9)
CLASSIFICATION ALGORITHMS (7)
HTML (7)
INDEXING (7)
LEARNING (ARTIFICIAL INTELLIGENCE) (7)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (7)
VISUALIZATION (7)
ENGINES (6)
GOOGLE (6)
IMAGE RETRIEVAL (6)
INDEXES (6)
ONTOLOGIES (6)
WEB PAGE (6)
CLUSTERING (5)
CRAWLERS (5)
DATABASES (5)
INFORMATION EXTRACTION (5)
PROBABILITY DENSITY FUNCTION (5)
TRAINING (5)
COMPUTERS (4)
DICTIONARIES (4)
FILTERING (4)
GRAMMARS (4)
INFORMATION RESOURCES (4)
MACHINE LEARNING (4)
NATURAL LANGUAGES (4)
ONTOLOGY (4)
PATTERN CLASSIFICATION (4)
SEARCH PROBLEMS (4)
SUPPORT VECTOR MACHINES (4)
TAGGING (4)
VOCABULARY (4)
BUILDINGS (3)
CLASSIFICATION (3)
CONFERENCES (3)
CONTEXT (3)
CORRELATION (3)
DOCUMENT HANDLING (3)
EDUCATION (3)
ENTROPY (3)
HISTORY (3)
JAVA (3)
KEYWORD EXTRACTION (3)
LEARNING SYSTEMS (3)
PARTITIONING ALGORITHMS (3)
PROBABILITY (3)
QUERY FORMULATION (3)
SEMANTIC WEB (3)
SENTENCE EXTRACTION (3)
TEXT CATEGORIZATION (3)
TEXT CLUSTERING (3)
WEB MINING (3)
ARTIFICIAL NEURAL NETWORKS (2)
BLOGS (2)
BOOK REVIEWS (2)
BOOKS (2)
BOOTSTRAPPING (2)
CHARACTER RECOGNITION (2)
COMPANIES (2)
COMPLEXITY THEORY (2)
COMPUTER ARCHITECTURE (2)
CONTENT MANAGEMENT (2)
CONTENT-BASED RETRIEVAL (2)
CULTURAL DIFFERENCES (2)
DATA VISUALISATION (2)
DOCUMENT CLASSIFICATION (2)
EDUCATIONAL INSTITUTIONS (2)
EQUATIONS (2)
GRAVITY (2)
HUMANS (2)
IMAGE COLOR ANALYSIS (2)
IMAGE PROCESSING (2)
IMAGE SEGMENTATION (2)
INFORMATION FILTERING (2)
INFORMATION FILTERS (2)
INFORMATION SERVICES (2)
LCS (2)
LIBRARIES (2)
MANUALS (2)
MATHEMATICAL MODEL (2)
MATRIX DECOMPOSITION (2)
MULTIMEDIA COMMUNICATION (2)
MUTUAL INFORMATION (2)
more

INFONA - science communication portal

Search results

Enhancement tools for Arabic web search

Proposal and Evaluation of an Extraction Method for Inaccurate Example Sentences Using a Web Search Engine for Multilingual Parallel Texts

Arabic collocations extraction using Gate

The Semantic Vectors Package: New Algorithms and Public Tools for Distributional Semantics

A fast associative mining system based on search engine and concept graph for large-scale financial report texts

The Development of the Tool for Understanding Contents of a Large Amount of Text by Means of User-Generated Word Network

A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs

A Framework to Answer Questions of Opinion Type

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

WISDOM from Light-Weight Information Retrieval

Extraction of purpose data using surface text patterns

Zycox: File analyzer on the web

The prompting method of IPTV input method for Chinese with full text search technology

HisTrace: A system for mining on news-related articles instead of web pages

Identifying licensing of jar archives using a code-search approach

OntoWrap- extracting data records from search engine results pages using ontological technique

An Experimental Study of Vietnamese Question Answering System

A Personalized Search Research Based on Vocabulary Semantic Net

Persian Web Pages Clustering Improvement: Customizing the STC Algorithm

Applying Collective Intelligence for Search Improvement on Thai Herbal Information

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options