The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Content is one of the most essential parts of products on e-commerce websites such as eBay. It not only drives user-engagement but also traffic from various search engine websites based on the relevance. Generating the content for the products, however comes with a wide set of challenges, due to the complexity of commerce at scale, and requires new applications in text processing and information extraction...
Due to the large amounts of Multimedia data on the Internet, Multimedia mining has become a very active area of research. Multimedia mining is a form of data mining. Data mining uses algorithms to segment data to identify useful patterns and to make predictions. Despite the successes in many areas, data mining remains a challenging task. In the past, multimedia mining was one of the fields where the...
Aspect-Oriented Programming (AOP) aims to address the scattering and tangling of cross-cutting concerns in a system. Many aspect mining techniques are proposed based on the concepts that concerns crosscut other modules of a system. Most of these researches targeted on a single version during the development of a software system. However, it is also possible that the difference between versions during...
As the "bag of words" approaches cut down the linkage between the words, they are hardly to be applied to explore the causal relations between the terms described from the text corpuses. To discover the networked causal knowledge from the corpus, we (1) propose the algorithm pv-swapping and the PV-parse-tree to adjust the term orders for the sentences by observing the relationship between...
In the early twenty-first century, social networks served only to let the world know our tastes, share our photos and share some thoughts. A decade later, these services are filled with an enormous amount of information. Now, the industry and the academia are exploring this information, in order to extract implicit patterns. Twitter Jam is a tool that analyses the contents of the social network Twitter...
To efficiently utilize their cloud based services, consumers have to continuously monitor and manage the Service Level Agreements (SLA) that define the service performance measures. Currently this is still a time and labor intensive process since the SLAs are primarily stored as text documents. We have significantly automated the process of extracting, managing and monitoring cloud SLAs using natural...
Pharmacovigilance is the field of science devoted to the collection, analysis, and prevention of Adverse Drug Reactions (ADRs). Efficient strategies for the extraction of information about ADRs from free text sources are essential to support the important task of detecting and classifying unexpected pathologies, possibly related to (therapy-related) drug use. Narrative ADR descriptions may be collected...
In the biomedical and clinical domain, valuable information is frequently represented in free-text documents. Natural language processing (NLP) is a powerful tool that can extract structured information from theses documents. Word sense disambiguation (WSD) is a critical component in an NLP pipeline that increases the accuracy of the extracted information. However, WSD is expensive to apply for all...
Adverse drug events (ADEs) trigger a high number of hospital emergency room (ER) visits. Information about ADEs is often available in online drug databases in the form of narrative texts, and serves as the physician's primary reference point for ADE attribution and diagnosis. Manually reviewing these narratives, however, is an error prone and time consuming process, especially due to the prevalence...
The Sejong Electronic (machine-readable) Dictionary, developed by the 21st century Sejong Plan, contains a systematically organized information on Korean words. It helps to solve the problems encountered in the electronic formatting of a still-commonly-used hard-copy dictionary. The Sejong Electronic Dictionary, however, has a limitation relating to sentence structure and selection-restricted nouns...
Automatically extracting events from large, unstructured/semi-structured textual data requires a mechanism for identifying the event, abstracting it from the text, validating the event's occurrence against some known values, and sharing the event with users effectively. Inherent in the challenge of Big Data is that it often exceeds a scale at which humans can effectively operate. In this paper, we...
Disease/Disorder Template Filling is a complicated task of relation extraction, requiring a combination of several methods in order to solve it. The aim of this paper is to propose a combined approach for disorder template filling. The system combined three methods: rule-based, regular expression, and machine learning-based. This system added several features for the machine learning-based method...
In this paper, a new technique is presented for mining key domain areas from scientific publications. A domain refers to a particular branch of scientific knowledge and hence largely defines the theme of any scientific research paper. The proposed technique stems from a fusion of knowledge derived from natural language processing and machine learning. Some words or phrases are extracted based on their...
A troll is a user intent on sowing discord on the internet. We propose an approach to detect such users from the sentiment of the textual content in online forums. Since trolls typically express negative sentiments in their posts, we derive features from sentiment analysis, and use SVMrank to do binary and ordinal classification of trolls. With a small labeled training set of 20 users, we achieved...
Our aim is to extract information about literary characters in unstructured texts. We employ natural language processing and reasoning on domain ontologies. The first task is to identify the main characters and the parts of the story where these characters are described or act. We illustrate the system in a scenario in the folktale domain. The system relies on a folktale ontology that we have developed...
Today, the crisis has worsened the panorama for Universities, placing new constraints that require being more sustainable economically. In addition, universities will also have to improve their research and teaching in order to obtain more research funds and attract more students. In this panorama, analytics can be a very useful tool since it allows academics (and university managers) to get a more...
There?s a big difference between driving suggestions that come from a newly licensed, know-it-all teenager and those that come from a professional racecar driver who has spent years honing skills on the course. The first is one you just want to be quiet, and the second is one you actually want to speak up.
Social emotion analysis of online users has become an important task for mining public opinions, which aims at detecting the readers' emotions evoked by online news articles. In this paper, we focus on building a social emotion analysis system (SEAS) for online news. The system has implemented a text data crawler for mainstream online news websites, the modules of document preprocessing, document...
With the prominent advances in Web interaction and the enormous growth in user-generated content, sentiment analysis has gained more interest in commercial and academic purposes. Recently, sentiment analysis of Arabic user-generated content is increasingly viewed as an important research field. However, the majority of available approaches target the overall polarity of the text. To the best of our...
Automatic Term Extraction is an important issue in Natural Language Processing. This paper presents a new approach of terminology extraction combining with machine learning based on cascaded conditional random fields and corpus-based statistical model. In this approach, firstly, the low-layer and high-layer conditional random fields (CRFs) are used to extract the simple and compound terminologies...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.