Search results

Items from 1 to 20 out of 154 results

chapter

Accelerating search and recognition workloads with SSE 4.2 string and text processing instructions

Guangyu Shi, Min Li, M Lipasti

(IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE > 145 - 153

2011 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS 2011)

Today's information is increasing rapidly, doubling every three years. Consequently, the search and recognition stages in computer applications will consume a growing portion of the total CPU time. The SSE 4.2 instruction set, first implemented in Intel's Core i7, provides string and text processing instructions (STTNI) that utilize SIMD operations for processing character data. Though originally...

chapter

Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

Zhou Yao, Cao Ze-wen

2011 Fourth International Conference on Intelligent Computation Technology and Automation > 1 > 217 - 221

2011 International Conference on Intelligent Computation Technology and Automation (ICICTA)

In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference...

chapter

Study on question classification approach mixing multiple semantic characteristics together

LiGuo Duan, YanQin Niu, JunJie Chen

2011 3rd International Conference on Computer Research and Development > 1 > 354 - 357

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...

chapter

A document comparison approach using hybrid keyword and structured full text vocabulary searches

K Boonsuk, P Sophatsathit

2011 3rd International Conference on Computer Research and Development > 1 > 252 - 257

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step builds a suffix tree of frequently used vocabulary to retrieve the most similar...

chapter

ETAO: Symbol mapping transformation method for text compression

F M Baloul, M H Abdullah, E A Babikir

2011 3rd International Conference on Computer Research and Development > 3 > 133 - 138

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

This paper is proposing a novel idea for text transformation based on mapping single letters form the standard alphabetical order into the same set of single letters reordered by their relative frequencies. This method can be used as a complementary algorithm to enhance the statistical compression techniques. We have designed and implemented an algorithm called ETAO transformation method. It has been...

chapter

The Function of Fixed Word Combination in Chinese Chunk Parsing

Liqun Wang, Shoichi Yokoyama

2010 International Conference on Asian Language Processing > 73 - 76

2010 International Conference on Asian Language Processing (IALP 2010)

In this paper, we described an approach about chunk parsing using fixed word combination. It is different from the previous researches. We presented a pattern extraction and matching method of Chinese sentence with fixed word combination. After that we tested the pattern, and got a correct rate more than 96%. From the result of our experiment, we can identify that the analysis of syntax has been improved...

chapter

Auto-recognizing Letter-word Phrases in Chinese Texts

Zheng Zezhi

2010 Fourth International Conference on Genetic and Evolutionary Computing > 371 - 374

2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010)

As a group of unknown words of Chinese information processing, the letter-word phrases used in Chinese texts can't be identified correctly by the existed segmentation software. Here, an auto-tagging system of letter-word phrases based on rules and statistical data is presented. At first, the system scans the sentences to get letter-strings, and then takes every letter string as an anchor and scans...

chapter

Research on Text Clustering Based on Concept Weight

Yuqin Li, Xueqiang Lv, Yufang Liu, Shuicai Shi

2010 Fourth International Conference on Genetic and Evolutionary Computing > 232 - 235

2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010)

Through research on the calculation method of feature words' weight in texts and semantic similarity between words, we proposed a calculation method of feature words' weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension...

chapter

Converting printed Sinhala documents to formatted editable text

S Ajward, N Jayasundara, S Madushika, R Ragel

2010 Fifth International Conference on Information and Automation for Sustainability > 138 - 143

2010 5th International Conference on Information and Automation for Sustainability (ICIAfS)

Digitizing printed document is always a challenge faced by the computing society. Digitization of text not only allows users to easily modify and reprint printed documents, but also is a need of the day due to the use of word-search capability available at disposal in this era. Converting a printed document into a stream of characters using OCR (optical character recognition) techniques is a widely...

chapter

Word segmentation in a document image using spectral partitionin

V Manikandan, V Venkatachalam, M Kirthiga, K Harini, more

2010 IEEE International Conference on Computational Intelligence and Computing Research > 1 - 4

2010 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC 2010)

State of art document segmentation algorithms employ adhoc solutions which use some document properties and iteratively segment the document image. These solutions need to be adapted frequently and sometimes fail to perform well for complex scripts. This calls for a generalized solution that achieves a one shot segmentation that is globally optimal. This paper describes one such solution based on...

chapter

A Dictionary Mechanism for Chinese Word Segmentation Based on the Finite Automata

Wu Yang, Li-Yun Ren, Rong Tang

2010 International Conference on Asian Language Processing > 39 - 42

2010 International Conference on Asian Language Processing (IALP 2010)

Dictionary mechanism is the basis of Chinese word segmentation, and its quality directly affects the speed and efficiency of Chinese word segmentation. In existing dictionary mechanisms, there are such shortages as space wasting, low efficiency, and difficult maintenance, and therefore, how to establish an effective mechanism is an urgent problem for Chinese word segmentation. In this paper, the idea...

chapter

Annotation Guidelines for Hindi-English Word Alignment

Rahul Kumar Yadav, Deepa Gupta

2010 International Conference on Asian Language Processing > 293 - 296

2010 International Conference on Asian Language Processing (IALP 2010)

A duo such as Hindi-English (Hin-Eng) does differ in terms of grammar, and thus finding correspondences is often quite obscure in word alignment. Hindi being rich in morphology makes the alignment with its counterpart a bit contingent and invites obscurities in annotation process. We present annotation guidelines for Hin-Eng word alignment through contrastive analysis of the two languages. We applied...

chapter

Job Opportunity Mining by Text Categorization

Shilin Zhang, Mei Gu

2010 2nd International Conference on Information Engineering and Computer Science > 1 - 4

2010 2nd International Conference on Information Engineering and Computer Science (ICIECS)

Text Classification is an important field of research. There are a number of approaches to classify text documents. However, there is an important challenge to improve the computational efficiency and recall. In this paper, we propose a novel framework to segment Chinese words, generate word vectors, train the corpus and make prediction. Based on the text classification technology, we successfully...

chapter

Lexical Gap in English - Vietnamese Machine Translation: What to Do?

Le Manh Hai, Phan Thi Tuoi

2010 International Conference on Asian Language Processing > 265 - 269

2010 International Conference on Asian Language Processing (IALP 2010)

In English - Vietnamese machine translation (EVMT) project at Ho Chi Minh City University of Technology there are some problems that cause the system to malfunction. One of the most undesired phenomena is lexical gap. A lexical gap occurs in case of lacking Vietnamese equivalent word to English word. There are some approaches to this obstacle. Some researchers prefer replacing lexical gap by its nearest...

chapter

A supervised ranking approach for detecting relationally similar word pairs

D Bollegala

2010 Fifth International Conference on Information and Automation for Sustainability > 323 - 328

2010 5th International Conference on Information and Automation for Sustainability (ICIAfS)

The similarity between the semantic relations that exist between two word pairs is defined as their relational similarity. For example, the semantic relation, is a large holds between the words in the word pair (lion, cat) and (ostrich, bird), because lion is a large cat, and ostrich is the largest living bird on earth. Consequently, the two word pairs, (lion, cat) and (ostrich, bird), are considered...

chapter

Design and Implementation of Electronic Medical Record Template Based on XML Schema

Haijun Yang

2010 Second World Congress on Software Engineering > 1 > 225 - 228

2010 Second World Congress on Software Engineering (WCSE 2010)

It is critical to think of creating electronic medical record (EMR) templates for general utilization of EMR due to semi-structured features. Word processor is widely used for recording patient electronic information. However, the most weakness of these editors is that it is hard to extract medical data from text document. Also it is less flexible to present data in some other forms. This paper provides...

chapter

Affective-word based Chinese text sentiment classification

Yue Ning, Tingshao Zhu, Yan Wang

5th International Conference on Pervasive Computing and Applications > 111 - 115

2010 5th International Conference on Pervasive Computing and Applications (ICPCA 2010)

When browsing news on the web, various emotions may be evoked in readers and furthermore cause different influence on their minds and life. We expect that emotional analysis and classification of text may provide good performance and significance to users surfing the Internet. Most previous research only focus on bi-emotion classification, that is, Positive and Negative, e.g., identifying whether...

chapter

Self intelligence with text recognization

S L Wasankar, H Mahajan, D Deshmukh, H Munot

2010 International Conference on Signal and Image Processing > 521 - 525

2010 International Conference on Signal and Image Processing (ICSIP 2010)

The prime objective of this Research is the development of effective reading skills in Machines. After reading the text and comprehending the meaning, it would self-program itself and according to the program it would implement the instructions. Here we are exploring a new era of computer vision and related Research. The current investigation presents an algorithm and software which detects, recognizes...

chapter

Segmentation of text lines into words for Gujarati handwritten text

C Patel, A Desai

2010 International Conference on Signal and Image Processing > 130 - 134

2010 International Conference on Signal and Image Processing (ICSIP 2010)

A presentation on attempt to extract words from handwritten text lines in Gujarati script is hereby submitted. The very cursive nature of most Indian scripts makes the word extraction process a very critical one for Optical Character Recognition (OCR) activity. This cursive nature also causes difficulty during character extraction and modifier extraction. Word extraction is considered as one of the...

chapter

Co-occurrence based predictors for estimating query difficulty

H Imran, A Sharan

2010 IEEE International Conference on Data Mining Workshops > 867 - 874

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

Query difficulty prediction aims to identify, in advance, how reliably an information retrieval system will perform when faced with a particular user request. The prediction of query difficulty level is an interesting and important issue in Information Retrieval (IR) and is still an open research. In order to appreciate importance of query difficulty prediction we present an example., Information...

Keywords:
WORD PROCESSING
Publication type:
book

Publication date

Set your own date range

Content availability

Available (149)
None (5)

Keywords

NATURAL LANGUAGE PROCESSING (49)
DATA MINING (46)
FEATURE EXTRACTION (34)
ACCURACY (23)
ALGORITHM DESIGN AND ANALYSIS (23)
TRAINING (23)
CLASSIFICATION ALGORITHMS (21)
TEXT CATEGORIZATION (20)
DICTIONARIES (18)
INFORMATION RETRIEVAL (18)
SEMANTICS (18)
TEXT PROCESSING (17)
PATTERN CLASSIFICATION (16)
SUPPORT VECTOR MACHINES (14)
INTERNET (13)
HIDDEN MARKOV MODELS (12)
COMPUTATIONAL MODELING (11)
STATISTICAL ANALYSIS (11)
CONTEXT (10)
DATABASES (10)
PATTERN CLUSTERING (10)
CLUSTERING ALGORITHMS (9)
INDEXES (9)
INDEXING (9)
LEARNING (ARTIFICIAL INTELLIGENCE) (9)
MACHINE LEARNING (9)
NATURAL LANGUAGES (9)
SPEECH (9)
TAGGING (9)
TEXT MINING (9)
OPTICAL CHARACTER RECOGNITION SOFTWARE (8)
PRESSES (8)
VOCABULARY (8)
WORD SEGMENTATION (8)
COMPUTERS (7)
DATA COMPRESSION (7)
DOCUMENT IMAGE PROCESSING (7)
ENCODING (7)
LINGUISTICS (7)
PROBABILITY (7)
STRING MATCHING (7)
SUPPORT VECTOR MACHINE (7)
SUPPORT VECTOR MACHINE CLASSIFICATION (7)
TEXT CLASSIFICATION (7)
ARTIFICIAL NEURAL NETWORKS (6)
COMPUTATIONAL LINGUISTICS (6)
CONFERENCES (6)
CORRELATION (6)
ENTROPY (6)
HUMANS (6)
IMAGE SEGMENTATION (6)
MATHEMATICAL MODEL (6)
QUERY PROCESSING (6)
SEARCH ENGINES (6)
SOFTWARE (6)
TEXT CLUSTERING (6)
TEXT PREPROCESSING (6)
TEXT RECOGNITION (6)
VECTOR SPACE MODEL (6)
BAYES METHODS (5)
BAYESIAN METHODS (5)
BOOKS (5)
CHARACTER RECOGNITION (5)
COMPUTER SCIENCE (5)
DATA MODELS (5)
FEATURE SELECTION (5)
MANUALS (5)
OPTICAL CHARACTER RECOGNITION (5)
PIXEL (5)
PROBABILITY DENSITY FUNCTION (5)
SPEECH RECOGNITION (5)
TESTING (5)
ADAPTATION MODEL (4)
ARRAYS (4)
CHINESE INFORMATION PROCESSING (4)
CHINESE WORD SEGMENTATION (4)
COMPLEXITY THEORY (4)
DISTANCE MEASUREMENT (4)
DOCUMENT CLUSTERING (4)
EDUCATIONAL INSTITUTIONS (4)
ELECTRONIC MAIL (4)
GRAMMARS (4)
HANDWRITTEN CHARACTER RECOGNITION (4)
IMAGE RECOGNITION (4)
INFORMATION PROCESSING (4)
LANGUAGE TRANSLATION (4)
LATENT SEMANTIC ANALYSIS (4)
PLAGIARISM (4)
PRAGMATICS (4)
TEXT COMPRESSION (4)
VISUALIZATION (4)
WORDNET (4)
WRITING (4)
ABSTRACTS (3)
ALGORITHM (3)
ANALYTICAL MODELS (3)
ARTIFICIAL INTELLIGENCE (3)
CHINESE TEXT PROCESSING (3)
more

INFONA - science communication portal

Search results

Accelerating search and recognition workloads with SSE 4.2 string and text processing instructions

Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

Study on question classification approach mixing multiple semantic characteristics together

A document comparison approach using hybrid keyword and structured full text vocabulary searches

ETAO: Symbol mapping transformation method for text compression

The Function of Fixed Word Combination in Chinese Chunk Parsing

Auto-recognizing Letter-word Phrases in Chinese Texts

Research on Text Clustering Based on Concept Weight

Converting printed Sinhala documents to formatted editable text

Word segmentation in a document image using spectral partitionin

A Dictionary Mechanism for Chinese Word Segmentation Based on the Finite Automata

Annotation Guidelines for Hindi-English Word Alignment

Job Opportunity Mining by Text Categorization

Lexical Gap in English - Vietnamese Machine Translation: What to Do?

A supervised ranking approach for detecting relationally similar word pairs

Design and Implementation of Electronic Medical Record Template Based on XML Schema

Affective-word based Chinese text sentiment classification

Self intelligence with text recognization

Segmentation of text lines into words for Gujarati handwritten text

Co-occurrence based predictors for estimating query difficulty

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options