The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Most of the machine learning algorithms requires the input to be denoted as a fixed-length feature vector. In text classifications (bag-of-words) is a popular fixed-length features. Despite their simplicity, they are limited in many tasks; they ignore semantics of words and loss ordering of words. In this paper, we propose a simple and efficient neural language model for sentence-level classification...
Since the advent of the IoT era, various IoT devices have proliferated, transforming ordinary spaces into smart spaces such as smart home, smart office, and smart building. To provide user-friendly service to people, the majority of previous studies have focused on activity recognition and prediction in singleuser environments such as ambient assisted living (AAL) and activities of daily living (ADL)...
With the development of computer and network techniques, and the digital Chinese news texts explosion, facing a massive unstructured news data, a better way for knowledge extraction and storage, on the one hand, can help readers understand the core content of news, on the other hand, completed news knowledge accumulation will support the reportage. In recent years, information extraction technology...
Bug-reports are valuable sources of information. However, study of the bug-reports’ content written in natural language demands tedious human efforts for manual interpretation. This difficulty limits the scale of empirical studies, which rely on interpretation and categorization of bug-reports. In this work, we investigate the effectiveness of Labeled Latent Dirichlet Allocation (LLDA) in automatic...
In mobile application development, the frequentsoftware release limits the testing time resource. In order todetect bugs in early phases, researchers proposed various testcase prioritization (TCP) techniques in past decades. In practice, considering that some test case is described or contains text, theresearchers also employed Natural Language Processing (NLP)to assist the TCP techniques. This paper...
The growing use of informal social text messages on Twitter is one of the known sources of big data. These type of messages are noisy and frequently rife with acronyms, slangs, grammatical errors and non-standard words causing grief for natural language processing (NLP) techniques. In this study, our contribution is to target non-standard words in the short text and propose a method to which the given...
An understanding of success factor relationships in the context of business-to-business where Inter-organizational Relationship (IORs) between organizations is crucial for effective strategic management to accomplish marketing goals. Several studies regarding those success factors and their influences have been conducted and published as articles. We apply the technique of Named Entity Recognition...
Detecting actions or verbs in still images is a challenging problem for a variety of reasons such as the absence of temporal information and polysemy of verbs which lead to difficulty in generating large verb datasets. In this paper, we propose to first detect the prominent objects in the image and then infer the relevant actions or verbs using Natural Language Processing (NLP)-based techniques. The...
Now a days, the Twitter has become a most trusted platform of on-line micro-texts (tweets) for monitoring the public sentiment for any entity (events, topic or products). In recent years, many approaches have been used for sentiment analysis of on-line micro-texts in a manner to predict the public opinion for real world entities. However, the accuracy of prediction is highly dependent on the accuracy...
Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the higher accuracy (97%) of the predicted system. Significant...
In this paper, we implement a Convolutional Neural Network especially designed for Natural Language processing. With the help of this CNN, we try to classify sentences for sentiment analysis for which the embeddings used were learned from scratch rather than using pre-trained word2vec vectors. Here we try to vary the different parameters and learn how they effect on the performance of the CNN. From...
Comparable corpora contain significant quantities of useful data for Natural Language Processing tasks, especially in the area of Machine Translation. They are mainly the source of parallel text fragments. This paper investigates how to effectively extract bilingual texts from comparable corpora relying on a small-size parallel training corpus. We propose a new technique to filter non parallel articles...
In this paper, we try to make an author identification of two ancient Arabic religious books dating from the 6th century: The holy Quran and the Hadith. The authorship identification process is achieved through four phases which are: documents collection, text preprocessing, features extraction and classification model building. Thus, two series of experiments are undergone and commented. The first...
This paper presents our work on developing Vietnamese fundamental tools and a resource for analysis. These tools are for word segmentation and part-of-speech tagging, diacritics restoration, and orthographical variants dictionary. All of them have been either not publicly available so far or not attaining sufficient performance. We have developed the tools and released the tools to the public, in...
Traditional approaches to Named Entity Recognition almost heavily rely on feature engineering. In this paper, we introduce a kind of bidirectional recurrent neural network with long short memory (BLSTM) to capture bidirectional and long dependencies in a sentence without any feature set. Our model combines BLSTM network with Conditional Random Field (CRF) layer to jointly decode the best output. Additionally,...
Today, computing environment provides the possibility of carrying out various data-intensive natural language processing tasks. Language tokenization methods applied for multi-class text classification are recently investigated by many data scientists. The authors of this paper investigate Logistic Regression method by evaluating classification accuracy which correlates on the size of the training...
Robots often have limited knowledge about the environment and need to continuously acquire new knowledge in order to collaborate with the humans. To address this issue, this paper presents a method which allows the human to teach a robot new object types and attributes through natural language (NL) instructions. A simple yet robust vision algorithm is proposed to segment objects and describe the relations...
The conventional password cracking methods view the consecutive digits in passwords as a single unit without understanding the internal structures of digits. In this paper, in order to enhance the analysis of numerical passwords, we borrow the idea of chunking in psychology, and segment each numerical password into small chunks to help understand the structures. Empirically, we learn chunks and structures...
Current healthcare practice is transitioning from a provider-centered model to a patient-centered model of care, where patients are no longer passive recipients of care, but are encouraged to actively engage in and take greater responsibility for medical decision-making. As part of this trend patients are gaining access to larger and more diverse sets of medical texts through Electronic Medical Record...
We report several experiments on using Recurrent Neural Networks (RNNs) for sentence binary classification task. In terms of sentence classification, RNNs have an important advantage compared to well-known traditional machine learning models (e.g. SVM and Maximum Entropy), in which it can naturally take into account neighboring information between contiguous words. In addition, to perform binary classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.