Search results

Items from 1 to 20 out of 228 results

chapter

A weight learning technique for cursive handwritten text categorization with fuzzy confusion matirx

G. Sarker

2016 2nd International Conference on Control, Instrumentation, Energy & Communication (CIEC) > 188 - 192

2016 2nd International Conference on Control, Instrumentation, Energy & Communication (CIEC)

A fuzzy confusion matrix based cursive handwritten text categorization has been implemented. Printed text is obtained from handwritten text through Modified Optimal Clustering Algorithm (MOCA). Optimal Clustering Algorithm (OCA) groups texts into different subject categories. Learning is conducted to extract the attributes along with corresponding weights for each subjects. Fuzzy confusion matrix...

chapter

An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier

Jiamin Xu, Hong Jiang

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 273 - 276

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method...

chapter

An experimental investigation on PCA based on cosine similarity and correlation for text feature dimensionality reduction

Maysa I Abdulhussain, John Q Gan

2015 7th Computer Science and Electronic Engineering Conference (CEEC) > 1 - 4

2015 7th Computer Science and Electronic Engineering (CEEC)

Principal component analysis (PCA) is a commonly used method for feature extraction and dimensionality reduction. This paper proposes PCA based on similarity/correlation criteria instead of covariance to gain low-dimensional features with high performance in text classification. Experimental results have demonstrated the advantages and usefulness of the proposed method in text classification in high-dimensional...

chapter

A novel classifier based on meaning for text classification

Murat Can Ganiz, Melike Tutkan, Selim Akyokus

2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA) > 1 - 5

2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA)

Text classification is one of the key methods used in text mining. Generally, traditional classification algorithms from machine learning field are used in text classification. These algorithms are primarily designed for structured data. In this paper, we propose a new classifier for textual data, called Supervised Meaning Classifier (SMC). The new SMC classifier uses meaning measure, which is based...

chapter

Evaluation of classification models for language processing

Zeynep Hilal Kilimci, Murat Can Ganiz

2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA) > 1 - 8

2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA)

Naïve Bayes is a commonly used algorithm in text categorization because of its easy implementation and low complexity. Naïve Bayes has mainly two event models used for text categorization which are multivariate Bernoulli and multinomial models. A very large number of studies choose multinomial model and Laplace smoothing just based on the assumption that it performs better than multivariate model...

chapter

A novel feature selection based on Tibetan grammar for Tibetan text classification

Tao Jiang, Hongzhi Yu

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) > 445 - 448

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)

Feature selection is a strategy that aims at making text classifiers more efficient and accurate. In this paper, we proposed a novel feature selection method based on Tibetan grammar for Tibetan classification. Tibetan language express grammatical meaning through the function words and word order, and the function word has large proportions. By analyzing the Tibetan grammar and distribution of part...

chapter

Evaluating text features for lyrics-based songwriter prediction

Basar Kirmaci, Hasan Ogul

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES) > 405 - 409

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES)

We offer an automated way of estimating the author of a song using only its lyrics content. To this end, we introduce a complete text classification framework which takes raw lyrics data as input and report estimated songwriter. The performance of the system is evaluated based on its classification and retrieval ability on a large dataset of Turkish songs, which was collected in this study. The results...

chapter

Optimal stop word selection for text mining in critical infrastructure domain

Kasun Amarasinghe, Milos Manic, Ryan Hruska

2015 Resilience Week (RWS) > 1 - 6

2015 Resilience Week (RWS)

Eliminating all stop words from the feature space is a standard practice of preprocessing in text mining, regardless of the domain which it is applied to. However, this may result in loss of important information, which adversely affects the accuracy of the text mining algorithm. Therefore, this paper proposes a novel methodology for selecting the optimal set of domain specific stop words for improved...

chapter

Children story classification based on structure of the story

Harikrishna D M, K. Sreenivasa Rao

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1485 - 1490

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

The main objective of this work is to classify Hindi and Telugu stories based on their structure into three genres: Fable, Folk-tale and Legend. In this work, each story is divided into three parts: (i) introduction, (ii) main and (iii) climax. The objective of this work is to explore how story genre information is embedded in different parts of the story. We are proposing a framework for story classification...

chapter

Classification and clustering for neuroinformatics: Assessing the efficacy on reverse-mapped NeuroNLP data using standard ML techniques

Nidheesh Melethadathil, Priya Chellaiah, Bipin Nair, Shyam Diwakar

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1065 - 1070

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

NeuroinformaticsNatural Language Processing (NeuroNLP) relies on clustering and classification for information categorization of biologically relevant extraction targets and for interconnections to knowledge-related patterns in event and text mined datasets. The accuracy of machine learning algorithms depended on quality of text-mined data while efficacy relied on the context of the choice of techniques...

chapter

Improved Single-Label Text Categorization by Instance Filtration

Kashif Ullah Khan, Usman Qamar

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems > 28 - 35

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

Machine learning classifiers are widely used for text categorization however a classifier misclassifies some of the instances into a category that is relevant to their actual category. The categorization ability of a classifier can be improved by filtering dataset with better classifier and removing such category for misclassified instances. In this paper we proposed a two level approach where level-1...

chapter

Comparison of Four Text Classifiers on Movie Reviews

Yaguang Wang, Wenlong Fu, Aina Sui, Yuqing Ding

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence > 495 - 498

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI)

Text Categorization plays an important role in the fields of information retrieval, machine learning, natural language processing, data mining and others. With the development of computer and information technology, there have been many classification algorithms. Each text classification algorithms will get result at differing speeds and efficiency due to the various feature of test text. It has been...

chapter

A Text Classifier of English Movie Reviews Based on Information Gain

Lianjing Jin, Wei Gong, Wenlong Fu, Hongbin Wu

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence > 454 - 457

2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI)

Text classification is the foundation and core of text mining. Naive Bayes is an effective method for text classification. This paper improves the accuracy of Naive Bayes classification using improved information gain, one of methods of feature extraction, by reducing the impact of low-frequency words. In this paper, we use a widely corpus of NLTK. According to the test results, The accuracy of the...

chapter

A comparison of similarity measures for online social media Thai text classification

Supatta Viriyavisuthisakul, Parinya Sanguansat, Pisit Charnkeitkong, Choochart Haruechaiyasak

2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) > 1 - 6

2015 12th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

Social media is widely used as a channel of communication in general purposes, including the comment that are related to retail business. It is a highly effective communication tool for direct interacting with their customers. Growth rate of the users is rapidly increasing, because they use this channel to receive information and share something interesting. In this paper, we present a comparison...

chapter

Topic identification of Arabic noisy texts based on KNN

Kheireddine Abainia, Siham Ouamour, Halim Sayoud

2015 International Conference on Information and Communication Technology Research (ICTRC) > 92 - 95

2015 International Conference on Information and Communication Technology Research (ICTRC)

This paper deals with the problem of topic identification of Arabic noisy texts, which is an important research field, regarding the growing amount of shared textual information in the world. The dataset used in this survey is constructed by collecting several corrupted Arabic texts from different discussion forums related to six different topics. The proposed algorithms use the k-nearest neighbor...

chapter

Performance of using LDA for Chinese news text classification

Xiaojun Wu, Liying Fang, Pu Wang, Nan Yu

2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE) > 1260 - 1264

2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE)

Chinese text classification is always challenging, especially when data are high dimensional and sparse. In this paper, we are interested in the way of text representation and dimension reduction in Chinese text classification. First, we introduces a topic model — Latent Dirichlet Allocation(LDA), which is uses LDA model as a dimension reduction method. Second, we choose Support Vector Machine(SVM)...

chapter

Investigate the Context Usage of Arabic Proverbs in Twitter

Rehab Nasser Al-Wehaibi, Muhammad Badruddin Khan

2015 International Conference on Cloud Computing (ICCC) > 1 - 8

2015 International Conference on Cloud Computing (ICCC)

Current technology facilitates and increases connections through social media, allowing individuals everywhere to spread their ideas to the world. One social media platform is Twitter. One characteristic of a tweet is the requirement of conveying a message in a limited number of words. Proverbs are a feature of language that convey messages effectively in the least number of words. Therefore, we selected...

chapter

Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

Tinh Thanh Dao, Tinh Dao Thanh, Thanh Nguyen Hai, Vinh Ho Ngoc

2015 Fifth International Conference on Communication Systems and Network Technologies > 1284 - 1288

2015 Fifth International Conference on Communication Systems and Network Technologies (CSNT)

In the languages, the occur of words are indicated about meaning of contents in text. Generative models for text, such as the topic model, have the potential to make important contributions to the statistical analysis of large document collections, and the development of a deeper understanding of human language learning and processing. In this paper, we proposed a novel method for building Vietnamese...

chapter

Parallel Processing System for Marathi Content Generation

Sushma R. Vispute, Shrikant Patil, Sagar Sangale, Akshay Padwal, more

2015 International Conference on Computing Communication Control and Automation > 575 - 579

2015 International Conference on Computing Communication Control and automation(ICCUBEA)

The objective of the present work is to design a HADOOP based parallel Marathi content retrieval system using clustering technique to get the efficient and optimized result than existing systems. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history and using time session mechanism for re ranking...

article

Towards Effective Bug Triage with Software Data Reduction Techniques

Jifeng Xuan, He Jiang, Yan Hu, Zhilei Ren, more

IEEE Transactions on Knowledge and Data Engineering > 2015 > 27 > 1 > 264 - 280

Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the...

Keywords:
ACCURACY

Publication date

Set your own date range

Content availability

Available (226)
None (2)

Publication type

book (225)
article (3)

Keywords

TRAINING (123)
CLASSIFICATION ALGORITHMS (114)
TEXT ANALYSIS (105)
SUPPORT VECTOR MACHINES (85)
TEXT CLASSIFICATION (69)
FEATURE EXTRACTION (63)
PATTERN CLASSIFICATION (51)
MACHINE LEARNING (50)
DATA MINING (41)
CLASSIFICATION (40)
FEATURE SELECTION (36)
SUPPORT VECTOR MACHINE CLASSIFICATION (30)
LEARNING (ARTIFICIAL INTELLIGENCE) (26)
ALGORITHM DESIGN AND ANALYSIS (21)
SUPPORT VECTOR MACHINE (21)
TESTING (21)
TEXT MINING (21)
COMPUTERS (20)
NATURAL LANGUAGE PROCESSING (20)
INTERNET (19)
BAYES METHODS (18)
KERNEL (17)
ARTIFICIAL NEURAL NETWORKS (16)
SEMANTICS (16)
EDUCATIONAL INSTITUTIONS (15)
NIOBIUM (15)
ENTROPY (14)
TRAINING DATA (14)
COMPUTATIONAL MODELING (13)
INFORMATION RETRIEVAL (13)
MACHINE LEARNING ALGORITHMS (13)
SVM (13)
DECISION TREES (12)
MUTUAL INFORMATION (12)
CLUSTERING ALGORITHMS (11)
STATISTICAL ANALYSIS (11)
CORRELATION (10)
VECTORS (10)
BAYESIAN METHODS (9)
PROBABILITY (9)
VECTOR SPACE MODEL (9)
WEB PAGES (9)
KNN (8)
MATHEMATICAL MODEL (8)
NOISE (8)
WEB SITES (8)
ELECTRONIC MAIL (7)
INFORMATION GAIN (7)
NAIVE BAYES (7)
NAIVE BAYES CLASSIFIER (7)
ORGANIZATIONS (7)
SOFTWARE (7)
ARTIFICIAL INTELLIGENCE (6)
CHINESE TEXT CATEGORIZATION (6)
CLUSTERING (6)
CONTEXT (6)
DATA MODELS (6)
DICTIONARIES (6)
GAIN (6)
PATTERN CLUSTERING (6)
ROUGH SET (6)
SENTIMENT ANALYSIS (6)
SUPERVISED LEARNING (6)
ARABIC TEXT CATEGORIZATION (5)
BUILDINGS (5)
COMPUTER SCIENCE (5)
DATABASES (5)
DOCUMENT CLASSIFICATION (5)
EQUATIONS (5)
FILTERING (5)
INDEXING (5)
MATRIX DECOMPOSITION (5)
NAïVE BAYES (5)
OPTIMIZATION (5)
ROUGH SET THEORY (5)
VOCABULARY (5)
WORD PROCESSING (5)
ARABIC TEXT CLASSIFICATION (4)
BIOLOGICAL SYSTEM MODELING (4)
BUSINESS (4)
CLASSIFICATION TREE ANALYSIS (4)
COMPLEXITY THEORY (4)
CONFERENCES (4)
DIMENSIONALITY REDUCTION (4)
DOCUMENT HANDLING (4)
DOCUMENT REPRESENTATION (4)
FEATURE WEIGHT (4)
GENETIC ALGORITHMS (4)
IMAGE EDGE DETECTION (4)
IMAGE SEGMENTATION (4)
INFORMATION FILTERING (4)
KNOWLEDGE BASED SYSTEMS (4)
LEARNING SYSTEMS (4)
LOGISTICS (4)
MATERIALS (4)
MEDIA (4)
NATURAL LANGUAGES (4)
NEURAL NETWORK (4)
more

INFONA - science communication portal

Search results

A weight learning technique for cursive handwritten text categorization with fuzzy confusion matirx

An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier

An experimental investigation on PCA based on cosine similarity and correlation for text feature dimensionality reduction

A novel classifier based on meaning for text classification

Evaluation of classification models for language processing

A novel feature selection based on Tibetan grammar for Tibetan text classification

Evaluating text features for lyrics-based songwriter prediction

Optimal stop word selection for text mining in critical infrastructure domain

Children story classification based on structure of the story

Classification and clustering for neuroinformatics: Assessing the efficacy on reverse-mapped NeuroNLP data using standard ML techniques

Improved Single-Label Text Categorization by Instance Filtration

Comparison of Four Text Classifiers on Movie Reviews

A Text Classifier of English Movie Reviews Based on Information Gain

A comparison of similarity measures for online social media Thai text classification

Topic identification of Arabic noisy texts based on KNN

Performance of using LDA for Chinese news text classification

Investigate the Context Usage of Arabic Proverbs in Twitter

Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification

Parallel Processing System for Marathi Content Generation

Towards Effective Bug Triage with Software Data Reduction Techniques

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options