Search results

Items from 1 to 8 out of 8 results

chapter

Research on Text Clustering Based on Concept Weight

Yuqin Li, Xueqiang Lv, Yufang Liu, Shuicai Shi

2010 Fourth International Conference on Genetic and Evolutionary Computing > 232 - 235

2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010)

Through research on the calculation method of feature words' weight in texts and semantic similarity between words, we proposed a calculation method of feature words' weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension...

chapter

Text categorization of Enron email corpus based on information bottleneck and maximal entropy

Man Wang, Yifan He, Minghu Jiang

IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS > 2472 - 2475

2010 10th International Conference on Signal Processing (ICSP 2010)

This paper is for text categorization of Enron email corpus, we use the information bottleneck (IB) method to cluster the key words based on their distribution on different class labels, then we use threads and address groups as additional features to email texts, and the maximal entropy model to improve the accuracy of the classifier. Our experimental results shows that these measures can improve...

chapter

E-Mail Filtering Based on Analysis of Structural Features and Text Classification

Xiao Li, Junyong Luo, Meijuan Yin

2010 2nd International Workshop on Intelligent Systems and Applications > 1 - 4

2010 2nd International Workshop on Intelligent Systems and Applications (ISA)

Concerning the requirement of e-mail filtering to improve the efficiency and accuracy in e-mail mining, topic detection, and many other specific applications, learnt from traditional spam filtering methods, an approach based on feature analysis and text classification is proposed. Utilizing some structural features which are very likely to identify an irrelevant e-mail, such as group sending, embedded...

chapter

Automatic text summarization based on sentences clustering and extraction

Zhang Pei-ying, Li Cun-he

2009 2nd IEEE International Conference on Computer Science and Information Technology > 167 - 170

2009 2nd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2009)

Technology of automatic text summarization plays an important role in information retrieval and text classification, and may provide a solution to the information overload problem. Text summarization is a process of reducing the size of a text while preserving its information content. This paper proposes a sentences clustering based summarization approach. The proposed approach consists of three steps:...

chapter

Highly Scalable SVM Modeling with Random Granulation for Spam Sender Detection

Yuchun Tang, Yuanchen He, S. Krasser

2008 Seventh International Conference on Machine Learning and Applications > 659 - 664

2008 Seventh International Conference on Machine Learning and Applications

Spam sender detection based on email subject data is a complex large-scale text mining task. The dataset consists of email subject lines and the corresponding IP address of the email sender. A fast and accurate classifier is desirable in such an application. In this research, a highly scalable SVM modeling method, named Granular SVM with Random granulation (GSVM-RAND), is designed. GSVM-RAND applies...

chapter

Sem@ntica: A system for semantic extraction and logical querying of text corpora

D.W. McMichael, R. Fu, S. Williams, G.A. Jarrad

2008 IEEE International Conference on Intelligence and Security Informatics > 277 - 278

2008 IEEE International Conference on Intelligence and Security Informatics (ISI 2008)

Sem@ntica is a system for extracting the information contained in collections of documents into a knowledge base. It combines high quality conventional named entity analysis with an ontology class labeling capability for open class words. The ontology comprises an upper ontology and one or more domain ontologies. The system has tools for rapidly designing the ontology and mapping segments of Word...

chapter

Sequential Pattern Mining for Chinese E-mail Authorship Identification

Jianbin Ma, Ying Li, Guifa Teng, Fang Wang, more

2008 3rd International Conference on Innovative Computing Information and Control > 73

2008 3rd International Conference on Innovative Computing Information and Control (ICICIC)

With the rapid growth in computer technology and popularization of Internet, e-mail has become one economical and convenient form of communication. But different types of crime and civil action involving e-mail documents appear which do harm to people's life and social's stabilization. So the criminal e-mail's authorship has to be identified automatically for the purpose of computer forensic. To solve...

chapter

A hybrid strategy to protein name recognition

Haochang Wang, Tiejun Zhao

2008 7th World Congress on Intelligent Control and Automation > 627 - 632

2008 7th World Congress on Intelligent Control and Automation

This paper presents a comprehensive approach to identifying protein name in biomedical texts. The new method integrated the generalized Winnow algorithm and the heuristic rules to implement of initial detection of protein name. Moreover, the system introduced a statistic method to analyses the reliability of recognized protein boundary, which can be then used for expanding protein boundary which has...

Filter options

Data set:
ieee
Keywords:
DATA MINING
FEATURE EXTRACTION
ELECTRONIC MAIL
TEXT ANALYSIS

Publication date

Set your own date range

Keywords

PATTERN CLUSTERING (3)
TRAINING (3)
ACCURACY (2)
BAYES METHODS (2)
CLASSIFICATION (2)
COMPUTERS (2)
DATABASES (2)
FEATURE SELECTION (2)
PATTERN CLASSIFICATION (2)
POSTAL SERVICES (2)
SUPPORT VECTOR MACHINES (2)
TEXT CATEGORIZATION (2)
TEXT CLASSIFICATION (2)
UNSOLICITED E-MAIL (2)
WORD PROCESSING (2)
ALGORITHM DESIGN AND ANALYSIS (1)
ARTIFICIAL INTELLIGENCE (1)
AUTOMATIC TEXT SUMMARIZATION (1)
AUTOMATION (1)
BAGGING (1)
BAYESIAN METHODS (1)
BAYESIAN SUM RULE (1)
BIBLIOGRAPHIC SYSTEMS (1)
BIOMEDICAL TEXT (1)
BLACKLIST (1)
BOOTSTRAPPING METHOD (1)
BOUNDARY EXPANSION (1)
CHINESE E-MAIL AUTHORSHIP IDENTIFICATION (1)
CHINESE TEXT PROCESSING (1)
CLASSIFICATION ALGORITHMS (1)
CLASSIFICATION ENSEMBLING (1)
CLASSIFIER PERFORMANCE (1)
COMPLEX LARGE-SCALE TEXT MINING (1)
COMPUTER CRIME (1)
COMPUTER FORENSIC (1)
CONCEPT DOCUMENT FREQUENCY (1)
CONCEPT FREQUENCY (1)
CONCEPT WEIGHT (1)
CONCEPT WEIGHT-BASED TEXT CLUSTERING (1)
DATA MODELS (1)
DOCUMENT HANDLING (1)
DOCUMENT SENTENCE (1)
DYNAMIC WEB PAGE (1)
E-MAIL FILTERING (1)
E-MAIL FILTERS (1)
E-MAIL MINING (1)
EDUCATIONAL INSTITUTIONS (1)
EMAIL CORPUS (1)
EMAIL SPAM DETECTION (1)
EMAIL SUBJECT DATA (1)
EMAIL TEXT (1)
ENGINES (1)
ENRON EMAIL CORPUS (1)
ENTITY ANALYSIS (1)
ENTROPY (1)
FATIGUE (1)
FAULT CURRENTS (1)
FEATURE SET (1)
FEATURE SUBSPACE RANDOM SELECTION (1)
FEATURE WORD (1)
FILTERING (1)
GAIN (1)
GENERALIZED WINNOW (1)
GENERALIZED WINNOW ALGORITHM (1)
GOLD (1)
GOVERNMENT (1)
GRANULAR COMPUTING (1)
GROUP-SENT MAIL (1)
HELIUM (1)
HEURISTIC RULE (1)
HIGH SCALABLE SVM MODELING (1)
INFORMATION BOTTLENECK (1)
INFORMATION EXTRACTION (1)
INFORMATION OVERLOAD PROBLEM (1)
INFORMATION PROCESSING (1)
INFORMATION RETRIEVAL (1)
INFORMATION SCIENCE (1)
INFORMATION SECURITY (1)
INTERNET (1)
IP ADDRESS (1)
IP NETWORKS (1)
KEY WORD CLUSTERING (1)
KM LANGUAGE (1)
KNOWLEDGE ACQUISITION (1)
KNOWLEDGE BASE (1)
KNOWLEDGE BASED SYSTEMS (1)
KNOWLEDGE ENGINEERING (1)
KNOWLEDGE EXTRACTION (1)
LABELING (1)
LEAD (1)
LEARNING (ARTIFICIAL INTELLIGENCE) (1)
LIBRARIES (1)
LOGIC GATES (1)
LOGICAL QUERYING (1)
MACHINE LEARNING (1)
MARKUP LANGUAGES (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options