Search results

Items from 1 to 20 out of 179 results

chapter

Distinguishing between authentic and fictitious user-generated hotel reviews

Snehasish Banerjee, Alton Y. K. Chua, Jung-Jae Kim

2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT) > 1 - 7

2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT)

The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose...

chapter

Textual risk mining for maritime situational awareness

Amir H. Razavi, Diana Inkpen, Rafael Falcon, Rami Abielmona

2014 IEEE International Inter-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA) > 167 - 173

2014 IEEE International Inter-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA)

In this paper, we propose an auxiliary Machine Learning (ML) and Natural Language Processing (NLP) integrated system for maritime situational awareness (MSA) operations. We bring into account a new and influential asset — human intuition and perception — to the existing semi-automated decision support systems that mostly rely on numerical data collected by electronic sensors or cameras located either...

chapter

Study on question classification approach mixing multiple semantic characteristics together

LiGuo Duan, YanQin Niu, JunJie Chen

2011 3rd International Conference on Computer Research and Development > 1 > 354 - 357

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...

chapter

A novel text classification based on Mahalanobis distance

Suli Zhang, Xin Pan

2011 3rd International Conference on Computer Research and Development > 3 > 156 - 158

2011 3rd International Conference on Computer Research and Development (ICCRD 2011)

In text mining field, The KNN (K Nearest Neighbors) is one of the oldest and simplest methods of text classification. But it is known to be sensitive to the distance (or similarity) function used in classifying a test instance, this disadvantage can cause low classification accuracy and limit the KNN classifier's utilization in text classification in text mining. In this paper, we introduce Mahalanobis...

chapter

Detection of Verbatim or Partial Duplication from Multiple Source Documents Using Data Mining Techniques and Case-Based Reasoning Methodologies

C Chaudhuri, A Chaudhuri

2011 Second International Conference on Emerging Applications of Information Technology > 129 - 132

Second International Conference on Emerging Applications of Information Technology (EAIT 2011)

This paper aims to specify a Case-Based Reasoning strategy for correctly classifying, storing and preventing duplication efforts of electronic text material. Preservation of complete source documents for checking similarity between them pose a daunting amount of spatial and computational complexity to researchers in this area. The problem is partially solved by applying certain preprocessing steps...

chapter

Text Mining Support for Software Requirements: Traceability Assurance

D Port, A Nikora, J H Hayes, LiGuo Huang

2011 44th Hawaii International Conference on System Sciences > 1 - 11

2011 44th Hawaii International Conference on System Sciences (HICSS 2011)

Requirements assurance aims to increase confidence in the quality of requirements through independent audit and review. One important and effort intensive activity is assurance of the traceability matrix (TM). In this, determining the correctness and completeness of the many-to-many relationships between functional and non-functional requirements (NFRs) is a particularly tedious and error prone activity...

article

Aspect-Based Opinion Polling from Customer Reviews

Jingbo Zhu, Huizhen Wang, Muhua Zhu, B K Tsou, more

IEEE Transactions on Affective Computing > 2011 > 2 > 1 > 37 - 49

Opinion polling has been traditionally done via customer satisfaction studies in which questions are carefully designed to gather customer opinions about target products or services. This paper studies aspect-based opinion polling from unlabeled free-form textual customer reviews without requiring customers to answer any questions. First, a multi-aspect bootstrapping method is proposed to learn aspect-related...

chapter

Dynamic Fluzzy Clustering Algorithm for Web Documents Mining

Qi Luo

2010 International Conference on Computational Intelligence and Security > 64 - 67

2010 International Conference on Computational Intelligence and Security (CIS 2010)

This paper first studies the methods of web documents mining and text clustering, and summaries the fuzzy clustering algorithms and similarity measure functions, then proposes a modified similarity function which can solve the problems of feature selection and feature extraction in high-dimensional space. Finally, this paper puts forward to a dynamic fluzzy clustering algorithm(DCFCM) by combining...

chapter

Investigating analysis of speech content through text classification

S Ezzat, N E Gayar, M M Ghanem

2010 International Conference of Soft Computing and Pattern Recognition > 105 - 110

2010 International Conference of Soft Computing and Pattern Recognition (SoCPaR 2010)

The field of Text Mining has evolved over the past years to analyze textual resources. However, it can be used in several other applications. In this research, we are particularly interested in performing text mining techniques on audio materials after translating them into texts in order to detect the speakers' emotions. We describe our overall methodology and present our experimental results. In...

chapter

Job Opportunity Mining by Text Categorization

Shilin Zhang, Mei Gu

2010 2nd International Conference on Information Engineering and Computer Science > 1 - 4

2010 2nd International Conference on Information Engineering and Computer Science (ICIECS)

Text Classification is an important field of research. There are a number of approaches to classify text documents. However, there is an important challenge to improve the computational efficiency and recall. In this paper, we propose a novel framework to segment Chinese words, generate word vectors, train the corpus and make prediction. Based on the text classification technology, we successfully...

chapter

Short Text Feature Selection for Micro-Blog Mining

Zitao Liu, Wenchao Yu, Wei Chen, Shuran Wang, more

2010 International Conference on Computational Intelligence and Software Engineering > 1 - 4

2010 International Conference on Computational Intelligence and Software Engineering (CiSE 2010)

Feather selection is a process that extracts a number of feature subsets which are the most representative of the original meaning from original feature set. It greatly reduces the text processing time and increases the accuracy because of removing some data outliers. With the rapid development of Web 2.0 and the further evolution of the Internet, short text like micro-blog plays an important role...

chapter

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

Lay-Ki Soon, Sang Ho Lee

2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems > 195 - 202

Sixth International Conference on Signal-Image Technology & Internet-Based Systems (SITIS 2010)

Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords, we propose to represent web pages using information extraction patterns that are...

chapter

An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews

Peng Jiang, Chunxia Zhang, Hongping Fu, Zhendong Niu, more

2010 IEEE International Conference on Data Mining > 256 - 265

2010 10th IEEE International Conference on Data Mining (ICDM 2010)

Opinion mining is a challenging task to identify the opinions or sentiments underlying user generated contents, such as online product reviews, blogs, discussion forums, etc. Previous studies that adopt machine learning algorithms mainly focus on designing effective features for this complex task. This paper presents our approach based on tree kernels for opinion mining of online product reviews....

chapter

A Framework for Emotion Mining from Text in Online Social Networks

M Yassine, H Hajj

2010 IEEE International Conference on Data Mining Workshops > 1136 - 1142

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

Online Social Networks are so popular nowadays that they are a major component of an individual's social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances...

chapter

Vote-Based LELC for Positive and Unlabeled Textual Data Streams

Bo Liu, Yanshan Xiao, Longbing Cao, P S Yu

2010 IEEE International Conference on Data Mining Workshops > 951 - 958

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step, we extract representative documents from unlabeled data and assign a vote score to each document. The assigned vote score reflects the degree of...

chapter

Automatic extraction and classification approach of opinions in texts

Rihab Bouchlaghem, Aymen Elkhlifi, Rim Faiz

2010 10th International Conference on Intelligent Systems Design and Applications > 918 - 922

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

In this paper, we present an approach to automatically extract and classify opinions in texts. We propose a similarity measurement calculating semantically distances between a word and predefined subgroups of seed words. We have evaluated our algorithm on the semantic evaluation company “SemEval 2007” corpus, and we obtained the best value of Precision and F1 62% and 61%. As an improvement of 20 %...

chapter

Research on Ontology-Based Text Representation of Vector Space Model

Guiying Wei, Mingming Bao, Sen Wu

2010 2nd International Workshop on Database Technology and Applications > 1 - 4

2010 2nd International Workshop on Database Technology and Applications (DBTA 2010)

In traditional Vector Space Model (VSM) the TF*IDF method is widely used to adjust the weight of terms in text mining. However TF*EDF can not represent the semantic information of text by neglecting the semantic relevance between terms. In this paper, an improved ontology-based VSM is presented, in which the ontology-based term similarity is used to readjust the weight of semantically related terms...

chapter

A refined weighted K-Nearest Neighbors algorithm for text categorization

Fang Lu, Qingyuan Bai

2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering > 326 - 330

2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2010)

Text categorization is one important task of text mining, for automated classification of large numbers of documents. Many useful supervised learning methods have been introduced to the field of text classification. Among these useful methods, K-Nearest Neighbor (KNN) algorithm is a widely used method and one of the best text classifiers for its simplicity and efficiency. For text categorization,...

chapter

Improving Arabic document categorization: Introducing local stem

Eiman Tamah Al-Shammari

2010 10th International Conference on Intelligent Systems Design and Applications > 385 - 390

10th International Conference on Intelligent Systems Design and Applications (ISDA 2010)

Stemming is a fundamental step in processing textual data preceding the tasks of text mining, Information Retrieval (IR), and natural language processing (NLP). The common goal of stemming is to standardize words by reducing a word to its base (root or stem), thus can be also considered a feature reduction technique. This paper aims at presenting a new dictionary free, content-based Arabic stemmer...

chapter

Text Clustering by 2D Cellular Automata Based on the N-Grams

R M Hamou, A Lehireche, A C Lokbani, M Rahmani

2010 First ACIS International Symposium on Cryptography, and Network Security, Data Mining and Knowledge Discovery, E-Commerce and Its Applications, and Embedded Systems > 271 - 277

2010 1st ACIS Intl. Symp. on Cryptography & Network Security, Data Mining & Knowledge Discovery, E-Commerce & Its Applications and Embedded Systems (CDEE 2010)

In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by the approach of N-grams. The cellular automaton that we propose in this paper is a grid cell structure with a flat neighborhood...

Keywords:
CLASSIFICATION ALGORITHMS
TEXT ANALYSIS

Publication date

Set your own date range

Content availability

Available (173)
None (6)

Publication type

book (178)
article (1)

Keywords

FEATURE EXTRACTION (78)
TRAINING (62)
TEXT CATEGORIZATION (61)
PATTERN CLASSIFICATION (54)
SUPPORT VECTOR MACHINES (49)
TEXT MINING (45)
ACCURACY (42)
ALGORITHM DESIGN AND ANALYSIS (35)
LEARNING (ARTIFICIAL INTELLIGENCE) (35)
NATURAL LANGUAGE PROCESSING (34)
INTERNET (33)
MACHINE LEARNING (33)
CLASSIFICATION (32)
TEXT CLASSIFICATION (31)
SUPPORT VECTOR MACHINE CLASSIFICATION (28)
CLUSTERING ALGORITHMS (27)
PATTERN CLUSTERING (26)
INFORMATION RETRIEVAL (25)
FEATURE SELECTION (18)
SUPPORT VECTOR MACHINE (17)
SVM (17)
INFORMATION EXTRACTION (14)
TESTING (13)
TEXT CLUSTERING (13)
DATABASES (12)
COMPUTERS (11)
WEB PAGES (11)
KERNEL (10)
ONTOLOGIES (10)
STATISTICAL ANALYSIS (10)
VECTOR SPACE MODEL (10)
BAYES METHODS (9)
IMAGE CLASSIFICATION (9)
MACHINE LEARNING ALGORITHMS (9)
OPINION MINING (9)
OPTIMIZATION (9)
PARTITIONING ALGORITHMS (9)
SENTIMENT ANALYSIS (9)
WORLD WIDE WEB (9)
ARTIFICIAL NEURAL NETWORKS (8)
ONTOLOGIES (ARTIFICIAL INTELLIGENCE) (8)
SEMANTICS (8)
WORD PROCESSING (8)
COMPUTATIONAL MODELING (7)
DICTIONARIES (7)
DOCUMENT CLASSIFICATION (7)
DOCUMENT IMAGE PROCESSING (7)
ENTROPY (7)
INDEXING (7)
MATHEMATICAL MODEL (7)
PREDICTION ALGORITHMS (7)
PROBABILITY (7)
SEARCH ENGINES (7)
WEB MINING (7)
WEB SITES (7)
BAYESIAN METHODS (6)
CONFERENCES (6)
HIDDEN MARKOV MODELS (6)
IMAGE SEGMENTATION (6)
ASSOCIATION RULES (5)
BLOGS (5)
COMPLEXITY THEORY (5)
COMPUTATIONAL LINGUISTICS (5)
CONTEXT (5)
DISTANCE MEASUREMENT (5)
DOCUMENT CLUSTERING (5)
GENETIC ALGORITHMS (5)
GRAPH THEORY (5)
IMAGE COLOR ANALYSIS (5)
KNOWLEDGE DISCOVERY (5)
NATURAL LANGUAGES (5)
ONTOLOGY (5)
PROBABILITY DENSITY FUNCTION (5)
ROUGH SET THEORY (5)
SENTIMENT CLASSIFICATION (5)
SVM CLASSIFIER (5)
TAGGING (5)
VECTORS (5)
ARTIFICIAL INTELLIGENCE (4)
BIOLOGICAL SYSTEM MODELING (4)
CLASSIFICATION TREE ANALYSIS (4)
CLUSTERING METHODS (4)
CONSTRUCTION INDUSTRY (4)
CORRELATION (4)
DECISION TREE (4)
DECISION TREES (4)
ENCODING (4)
EQUATIONS (4)
FEATURE SELECTION METHOD (4)
FILTERING (4)
FUZZY SET THEORY (4)
GENETIC ALGORITHM (4)
IMAGE EDGE DETECTION (4)
INFORMATION FILTERING (4)
KNN (4)
NOISE (4)
PATTERN MATCHING (4)
more

INFONA - science communication portal

Search results

Distinguishing between authentic and fictitious user-generated hotel reviews

Textual risk mining for maritime situational awareness

Study on question classification approach mixing multiple semantic characteristics together

A novel text classification based on Mahalanobis distance

Detection of Verbatim or Partial Duplication from Multiple Source Documents Using Data Mining Techniques and Case-Based Reasoning Methodologies

Text Mining Support for Software Requirements: Traceability Assurance

Aspect-Based Opinion Polling from Customer Reviews

Dynamic Fluzzy Clustering Algorithm for Web Documents Mining

Investigating analysis of speech content through text classification

Job Opportunity Mining by Text Categorization

Short Text Feature Selection for Micro-Blog Mining

Classifying Web Pages Using Information Extraction Patterns Preliminary Results and Findings

An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews

A Framework for Emotion Mining from Text in Online Social Networks

Vote-Based LELC for Positive and Unlabeled Textual Data Streams

Automatic extraction and classification approach of opinions in texts

Research on Ontology-Based Text Representation of Vector Space Model

A refined weighted K-Nearest Neighbors algorithm for text categorization

Improving Arabic document categorization: Introducing local stem

Text Clustering by 2D Cellular Automata Based on the N-Grams

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options