Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
A troll is a user intent on sowing discord on the internet. We propose an approach to detect such users from the sentiment of the textual content in online forums. Since trolls typically express negative sentiments in their posts, we derive features from sentiment analysis, and use SVMrank to do binary and ordinal classification of trolls. With a small labeled training set of 20 users, we achieved...
In this paper, we investigate the issue of detecting the real-life influence of people based on their Twitter account. We propose an overview of common Twitter features used to characterize such accounts and their activity, and show that these are inefficient in this context. In particular, retweets and followers numbers, and Klout score are not relevant to our analysis. We thus propose several Machine...
This paper deals with the problem of automatic topic identification of noisy Arabic texts. Actually, there exist several works in this field based on statistical and machine learning approaches for different text categories. Unfortunately, most of the proposed methods are effective in clean and long texts. In this research work, we use an in-house dataset of noisy Arabic texts, which are collected...
Transliteration forms an essential part of transcription which converts text from one writing system to another. The need for translating data has become larger than before as the world is getting together through social media. Machine transliteration has emerged as a part of information retrieval and machine translation projects to translate named entities, that are not registered in the dictionary,...
With the prominent advances in Web interaction and the enormous growth in user-generated content, sentiment analysis has gained more interest in commercial and academic purposes. Recently, sentiment analysis of Arabic user-generated content is increasingly viewed as an important research field. However, the majority of available approaches target the overall polarity of the text. To the best of our...
Along the prompt growth in World Wide Web, the availability and accessibility of regional language contents such as e-books, web pages, e-mails, and digital repositories has grown exponentially. As a result, the automatic document classification has become the hotspot for fetching information among the millions of web documents. The idea of classifying the text, forms the baseline for many NLP applications...
Parts of speech tagging is an important research topic in Natural Language Processing research are. Since it is one among the first steps of any natural language processing (NLP) techniques such as machine translation, if any error happens for tagging the same will repeat in the whole NLP process. So far works had been done on POS tagging based on SVM, MBLP, HMM, Ngram. All of these methods were not...
4 Rounds (4R) training method is practiced in industrial office site for reducing accidents caused by human factors. The 4R method enables to raise hazard-prediction capability of worker such as coping, decision-making to avoid danger situation. The workers as trainees train on their own by finding hazards which lurked in the hazard prediction training (KYT in Japanese) sheet. However, there is a...
A Treebank is a linguistic resource that is composed of a large collection of manually annotated and verified syntactically analyzed sentences. Statistical Natural Language Processing (NLP) approaches have been successful in using these annotations for developing basic NLP tasks such as tokenization, diacritization, part-of-speech tagging, parsing, among others. In this paper, we address the problem...
This paper deals with the problem of topic identification of Arabic noisy texts, which is an important research field, regarding the growing amount of shared textual information in the world. The dataset used in this survey is constructed by collecting several corrupted Arabic texts from different discussion forums related to six different topics. The proposed algorithms use the k-nearest neighbor...
Hindi language is written and spoken by majority of people in India. Like other natural languages, Hindi is also an ambiguous language which creates obstacle in usage of information technology properly. To use Hindi language efficiently and effectively on web, we require a tool to remove ambiguity from a single word, or from all words, called word sense disambiguation (WSD). In this paper we introduce...
Determining a systems design, analysis or approach to be of high or low quality remains a subjective assessment. Our field requires the ability to objectively grade the quality of a systems approach in advance of implementation and then correlate that assessment with outcomes.
The world behaves in a manner showing similarity in responses to various actions, this similarity in behavior needs to be tapped. This phenomenon is called Collective Behavior. Collective behavior is the like or similar response of the members of a society to a given stimulus or suggestion. The study of collective behavior can also be applied for the college campus environment. The system developed...
TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT stands for Trigrams‘n’Tags. Viterbi algorithm is used for finding the best tag sequence for a given observation sequence of words. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance...
This paper presents an approach to morphological analysis of Malayalam words as a classification Problem. The idea here is to use Memory Based Language Processing (MBLP) algorithm for Malayalam morphological analysis. MBLP is an approach to language processing based on exemplar storage during learning, and analogical reasoning during processing. The aim of the system is to find the citation forms...
Text-based sentiment analysis as a tool for monitoring online learning environment has elicited increasing interesting and been widely used in practice. Correctly identifying author sentiment in a stream of text presents a number of challenges including accurate language parsing, differing perspectives between author and reader, and the general difficulty in accurately classifying natural language...
The explosive growth of data, images in the World Wide Web makes it critical to the information retrievals. Image retrieval has been recognized as an elementary problem in the retrieval tasks and this exercise has got a wide attention based on the underlying domain characteristics. For instance, in social media data encompasses of noisy, diverse, heterogeneous, interconnected data. To confront these...
With the growth of the Internet community, textual data has proven to be the main tool of communication in human-machine and human-human interaction. This communication is constantly evolving towards the goal of making it as human and real as possible. One way of humanizing such interaction is to provide a framework that can recognize the emotions present in the communication or the emotions of the...
Work on training semantic slot labellers for use in Natural Language Processing applications has typically either relied on large amounts of labelled input data, or has assumed entirely unlabelled inputs. The former technique tends to be costly to apply, while the latter is often not as accurate as its supervised counterpart. Here, we present a semi-supervised learning approach that automatically...
N-grams are a building block in natural language processing and information retrieval. It is a sequence of a string data like contiguous words or other tokens in text documents. In this work, we study how N-gram can be computed efficiently using a MapReduce for distributed data processing and a distributed database named Hbase This technique is applied to construct the training and testing processes...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.