Search results

Items from 1 to 20 out of 32 results

chapter

Classifying Web Exploits with Topic Modeling

Jukka Ruohonen

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) > 93 - 97

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor...

chapter

Incremental technique with set of frequent word item sets for mining large Indonesian text data

Dian Sa'adillah Maylawati, Muhammad Ali Ramdhani, Ali Rahman, Wahyudin Darmalaksana

2017 5th International Conference on Cyber and IT Service Management (CITSM) > 1 - 6

2017 5th International Conference on Cyber and IT Service Management (CITSM)

Indonesian text data from social media is one of large text data that interesting to be mined. Mining the insight knowledge from large text data need more effort and time to processed. Moreover, Indonesian text data from social media contains natural language, including slang that require special treatment. We propose incremental technique for more efficient mining process of large text data with...

chapter

Domain specific syntax based approach for text classification in machine learning context

Alaa Mohasseb, Mohamed Bader-El-Den, Han Liu, Mihaela Cocea

2017 International Conference on Machine Learning and Cybernetics (ICMLC) > 2 > 658 - 663

2017 International Conference on Machine Learning and Cybernetics (ICMLC)

Due to the vast amount of data, searching and obtaining relevant information on the web is a challenging task. Despite that a broad range of classification techniques have been proposed to improve the information retrieval methods, many difficulties are still present because of the continuous increase in the amount of web contents, as well as its diversity. In this paper, we propose a method that...

chapter

Centrality-Based Approach for Supervised Term Weighting

Niloofer Shanavas, Hui Wang, Zhiwei Lin, Glenn Hawe

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 1261 - 1268

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

The huge amount of text documents has made the manual organization of text data a tedious task. Automatic text classification helps to easily handle the large number of documents by organising them automatically into predefined classes. The effectiveness and efficiency of automatic text classification largely depends on the way text documents are represented. A text document is usually viewed as a...

chapter

Method for classifying usability qualities and problems for action games from user reviews using text mining

Artinat Wattanaburanon, Nakornthip Propoon

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) > 1 - 6

2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS)

Game reviewing is one of the method for game users and critics to comment and discuss about a game. Game developers and marketers could use game reviews as insights to assist on designing a better game by specifying quality requirements and providing better game marketing. Usability and problems are major concerns of users and game developers since these quality affects users' satisfaction and opportunity...

chapter

Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language

Hazem Safwat, Normunds Gruzitis, Brian Davis, Ramona Enache

2016 IEEE Tenth International Conference on Semantic Computing (ICSC) > 87 - 90

2016 IEEE Tenth International Conference on Semantic Computing (ICSC)

Nowadays, most of the data on the Web is still in the form of unstructured text. Knowledge extraction from unstructured text is highly desirable but extremely challenging due to the inherent ambiguity of natural language. In this article, we present an architecture of an information extraction system based on the concept of Embedded Controlled Language that allows for extracting formal semantic knowledge...

chapter

Time-constrained requirements elicitation: reusing GitHub content

Roxana Lisette Quintanilla Portugal, Julio Cesar Sampaio do Prado Leite, Eduardo Almentero

2015 IEEE Workshop on Just-In-Time Requirements Engineering (JITRE) > 5 - 8

2015 IEEE Workshop on Just-In-Time Requirements Engineering (JITRE)

Requirements elicitation is the activity of identifying facts that compose the system requirements. One of the steps of this activity is the identification of information sources, which is a time-consuming task. Text documents are typically an important and abundant information source. However, their analysis to gather useful information is also time consuming and hard to automate. Because of its...

chapter

Building semantic richness among natural language content

Sulaiman Al-reyaee, P. Vijayakumar

Second International Conference on the Innovative Computing Technology (INTECH 2012) > 345 - 348

2012 Second International Conference on Innovative Computing Technology (INTECH)

In this work we propose Inclusive vector to keep the key words available in natural language database. The inclusive vectors are generated by the process of extraction of words given in the source and the cited items of records published in the ISI Thompson Citation Indexes. The proposed inclusive vector exhibits related words and the degree of their relationships. In this work we present the results...

chapter

Sentiment Analysis on Social Media

Federico Neri, Carlo Aliprandi, Federico Capeci, Montserrat Cuadros, more

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining > 919 - 926

2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

The Web is a huge virtual space where to express and share individual opinions, influencing any aspect of life, with implications for marketing and communication alike. Social Media are influencing consumersâ preferences by shaping their attitudes and behaviors. Monitoring the Social Media activities is a good way to measure customersâ loyalty, keeping a track on their sentiment...

chapter

Mining textual significant expressions reflecting opinions in natural languages

Jan Zizka, Frantisek Darena

2011 11th International Conference on Intelligent Systems Design and Applications > 136 - 141

2011 11th International Conference on Intelligent Systems Design and Applications (ISDA)

Revealing an opinion hidden in a text document is a challenging task. The article presents a method based on the automatic extraction of expressions that are significant for specifying a document attitude to a given topic. The significant expressions are composed using revealed significant words in the documents. The significant words are selected by the c5 decision-tree generator based on the entropy...

chapter

New method of detection and wiping of sensitive information

George Pecherle, Cornelia Gyorodi, Robert Gyorodi, Bogdan Andronic, more

2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing > 145 - 148

2011 IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

One of the biggest problems in sensitive data wiping is to determine if a file is sensitive or not. Data wiping applications have improved a lot, but they cannot determine by themselves if a file is sensitive. The method we propose tries to determine if a file is sensitive by using a pre-defined set of rules initially specified by the user. These rules can update themselves in time, by “learning”...

chapter

An automated domain specific stop word generation method for natural language text classification

Hakan Ayral, Sirma Yavuz

2011 International Symposium on Innovations in Intelligent Systems and Applications > 500 - 503

2011 International Symposium on Innovations in Intelligent Systems and Applications (INISTA)

In this paper we propose an automated method for generating domain specific stop words to improve classification of natural language content. Also we implemented a bayesian natural language classifier working on web pages, which is based on maximum a posteriori probability estimation of keyword distributions using bag-of-words model to test the generated stop words. We investigated the distribution...

chapter

Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

Zhou Yao, Cao Ze-wen

2011 Fourth International Conference on Intelligent Computation Technology and Automation > 1 > 217 - 221

2011 International Conference on Intelligent Computation Technology and Automation (ICICTA)

In the text preprocessing of text mining, a stop-word list is constructed to filter the segment results of the text documents so that the dimensionality of the text feature space can be cut down primarily. This paper summarized the definition, extraction principles and method of stop-word, and constructed a customizing Chinese-English stop-word list with the classical stop-word list based on the difference...

chapter

A Framework for Emotion Mining from Text in Online Social Networks

M Yassine, H Hajj

2010 IEEE International Conference on Data Mining Workshops > 1136 - 1142

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

Online Social Networks are so popular nowadays that they are a major component of an individual's social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances...

chapter

Developing a Dataset for Technology Structure Mining

B Qasemizadeh, P Buitelaar, F Monaghan

2010 IEEE Fourth International Conference on Semantic Computing > 32 - 39

2010 IEEE Fourth International Conference on Semantic Computing (ICSC)

This paper describes steps that have been taken to construct a development dataset for the task of Technology Structure Mining. We have defined the proposed task as the process of mapping a scientific corpus into a labeled digraph named a Technology Structure Graph as described in the paper. The generated graph expresses the domain semantics in terms of interdependencies between pairs of technologies...

article

OntoGene in BioCreative II.5

F Rinaldi, G Schneider, K Kaljurand, S Clematide, more

IEEE/ACM Transactions on Computational Biology and Bioinformatics > 2010 > 7 > 3 > 472 - 480

We describe a system for the detection of mentions of protein-protein interactions in the biomedical scientific literature. The original system was developed as a part of the OntoGene project, which focuses on using advanced computational linguistic techniques for text mining applications in the biomedical domain. In this paper, we focus in particular on the participation to the BioCreative II.5 challenge,...

chapter

Aliases discovered in Thai sports news articles

T. Suwanapong, T. Theeramunkong

2009 Eighth International Symposium on Natural Language Processing > 63 - 66

2009 Eighth International Symposium on Natural Language Processing. SNLP 2009

Aliases discovered in Thai articles are challenging. We apply a standard vector space model to explore and match aliases with formal names or each others. On first construct a term-by-document matrix (TDM), which contains term frequency of term occurring in document collection assuming that all terms exist in the typed named entity dictionary. Normalization techniques are used instead of standard...

chapter

Linguistic text mining for problem reports

J.T. Malin, C. Millward, H.A. Schwarz, F. Gomez, more

2009 IEEE International Conference on Systems, Man and Cybernetics > 1578 - 1583

2009 IEEE International Conference on Systems, Man and Cybernetics. SMC 2009

This paper describes a linguistic text mining tool for analyzing problem reports in aerospace engineering and safety organizations. The semantic trend analysis tool (STAT) helps analysts find and review recurrences, similarities and trends in problem reports. The tool is being used to analyze engineering discrepancy reports at NASA Johnson Space Center. The tool has been augmented with a statistical...

chapter

Text representation and classification based on multi-instance learning

He Wei, Wang Yu

2009 International Conference on Management Science and Engineering > 34 - 39

2009 16th International Conference on Management Science and Engineering (ICMSE)

In multi-instance learning, the training set comprises labeled bags which are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a text mining problem, i.e. text representation, is investigated from a multi-instance view. In detail, each text is regarded as a bag while each of its sentences is regarded as an instance. Bag can be labeled by its class...

chapter

Ontology's structuring based on the evolutional sequences and the preparation method of its filling

R. Pasichnyk, A. Sachenko

2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications > 570 - 573

2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS 2009)

A paper presents the ontology creation approach, based on evolutional sequences. The universal system of accumulation of the applied scientific knowledge is proposed. The method of compression of text content also is discussed.

Keywords:
NATURAL LANGUAGES

Publication date

Set your own date range

INFONA - science communication portal

Search results

Classifying Web Exploits with Topic Modeling

Incremental technique with set of frequent word item sets for mining large Indonesian text data

Domain specific syntax based approach for text classification in machine learning context

Centrality-Based Approach for Supervised Term Weighting

Method for classifying usability qualities and problems for action games from user reviews using text mining

Extracting Semantic Knowledge from Unstructured Text Using Embedded Controlled Language

Time-constrained requirements elicitation: reusing GitHub content

Building semantic richness among natural language content

Sentiment Analysis on Social Media

Mining textual significant expressions reflecting opinions in natural languages

New method of detection and wiping of sensitive information

An automated domain specific stop word generation method for natural language text classification

Research on the Construction and Filter Method of Stop-word List in Text Preprocessing

A Framework for Emotion Mining from Text in Online Social Networks

Developing a Dataset for Technology Structure Mining

OntoGene in BioCreative II.5

Aliases discovered in Thai sports news articles

Linguistic text mining for problem reports

Text representation and classification based on multi-instance learning

Ontology's structuring based on the evolutional sequences and the preparation method of its filling

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options