The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The proliferation of Web 2.0 technologies and the increasing use of computer-mediated communication resulted in a new form of written text, termed microtext. This poses new challenges to natural language processing tools which are usually designed for well-written text. This paper proposes a phonetic-based framework for normalizing microtext to plain English and, hence, improve the classification...
Network embedding aims at projecting the network data into a low-dimensional feature space, where the nodes are represented as a unique feature vector and network structure can be effectively preserved. In recent years, more and more online application service sites can be represented as massive and complex networks, which are extremely challenging for traditional machine learning algorithms to deal...
Natural Language Processing and Machine Learning techniques can be used to automatically identify, extract and manipulate textual clinical data. Many of these methods are strongly dependent on annotated corpora that are very difficult to find in the clinical domain, especially for the Brazilian Portuguese language. The annotation task is expensive and time-consuming; hence, it is important to provide...
SNOMED CT has continued to expand its adoption as a comprehensive clinical terminology. In 2013, the US government mandated SNOMED CT as a requirement for Stage 2 meaningful use criteria for Electronic Health Records. Studies have, however, identified inconsistencies in the content of SNOMED CT that may lessen its effectiveness when used for encoding patient data. Auditing thus becomes an integral...
Background: Code smells are indicators of quality problems that make a software hard to maintain and evolve. Given the importance of smells in the source code's maintainability, many studies have explored the characteristics of smells and analyzed their effects on the software's quality. Aim: We aim to investigate fundamental characteristics of code smells through an empirical study on frequently...
Diseases/Chemical play central roles in many areas of biomedical research and healthcare. Consequently, aggregating the disease knowledge and treatment research reports becomes an extremely critical issue, especially in rapid-growth knowledge bases (e.g., PubMed). Thus, a framework of disease/chemical named entity recognition and normalization has become increasingly important for biomedical text...
In this article we address the problem of expanding the set of papers that researchers encounter when conducting bibliographic research on their scientific work. Using classical search engines or recommender systems in digital libraries, some interesting and relevant articles could be missed if they do not contain the same search key-phrases that the researcher is aware of. We propose a novel model...
Traditional Chinese Medicine (TCM) is a discipline typically characterized as complicated information science [1], which represents the distinctive thought (e.g. theories drawn from clinic should be applied in clinic), compared with the modern medicine. Symptoms are basic clinical concepts in TCM electronic health record, which are plain text language. Nevertheless, TCM concepts are inherently characterized...
The Unified Medical Language System (UMLS) is an important terminological system. By the policy of its curators, each concept of the UMLS should be assigned the most specific Semantic Types (STs) in the UMLS Semantic Network (SN). Hence, the Semantic Types of most UMLS concepts are assigned at or near the bottom (leaves) of the UMLS Semantic Network. While most ST assignments are correct, some errors...
Gene Ontology (GO) provides a controlled vocabulary for describing genes and related gene products. Quality assurance of Gene ontology (GO) is a vital aspect of the terminology management lifecycle. In this paper, we introduce a lexical-based inference approach to detecting subtype (or isa) inconsistencies among GO terms (i.e., biological concepts). We first model the name of each concept as a set...
External knowledge sources are commonly used in processing large amounts of data. Large external knowledge sources, such as ontologies, often contain hundreds of thousands of concepts and relationships, making comprehension and navigation difficult. Abstraction networks enhance the usability and comprehensibility of these resources by providing a higher level of abstraction. In this paper, we develop...
The importance of functional status information (FSI) has become increasingly evident in recent years [1, 2]. However, implementation, application, and normalization of FSI in health care and Electronic Health Records (EHRs) have been largely underexplored. The World Health Organization's International Classification of Functioning, Disability and Health (ICF) [3] is considered to be the international...
The Gene hierarchy of the National Cancer Institute (NCI) Thesaurus (NCIt) is of high priority for NCI. It is important to have quality assurance (QA) techniques to improve its content quality. We present a two-step methodology concentrating on auditing the modeling of complex concepts, which are shown to have a higher error rate compared to control concepts. In the first step, we test whether concepts...
In a non-uniform Constraint Satisfaction problem CSP(Γ), where G is a set of relations on a finite set A, the goal is to find an assignment of values to variables subject to constraints imposed on specified sets of variables using the relations from Γ. The Dichotomy Conjecture for the non-uniform CSP states that for every constraint language \Gm the problem CSP(Γ)...
Systems of Systems (SoS) are defined - amongst other distinguishing features - as a collection of component systems that produce results not achievable by the individual systems alone. But SoS is more than just a higher level focus on factors that are primarily concerned with the mechanics of data handling. SoS deals with the process of work-in-progress and how the design drives the implementation...
“EGGSORT: Avian Egg Gender Classification in Early Stages of Incubation” is an R&D project supported by TÜBİTAK and aims to find differentiating information on avian egg gender. Current study is about measuring performance of spectroscopy differentiation methods on egg white and egg yolk and the effect of eggshell on these measurements. Preprocessing methods and correlation measurements between...
In recent years, there has been an increasing interest among researchers and practitioners concerning Enterprise Architecture (EA). Despite this increase, several studies have reported a lack of common understanding in EA. Some specific expressions like lack of common terminology, lack of shared meaning and fragmented literature have been frequently used to describe this lack. However, very few systematic...
Binary classification is one of the most frequent studies in applied machine learning problems in various domains, from medicine to biology to meteorology to malware analysis. Many researchers use some performance metrics in their classification studies to report their success. However, the literature has shown a widespread confusion about the terminology and ignorance of the fundamental aspects behind...
According to the concept system of acupuncture and moxibustion subject, the semantic types of acupoint and acupuncture method are supplemented and adjusted, and their semantic relationships are discussed about and studied, so as to realize the networking of related concepts and terminologies of acupoint and acupuncture methods. We design the classification framework, and construct acupuncture terminology...
In open software development environment, a large number of feature requests with mixed quality are often posted by stakeholders and usually managed in issue tracking systems. Thoroughly understanding and analyzing the real intents that feature requests imply is a labor-intensive and challenging task. In this paper, we introduce an approach to understand feature requests automatically. We generate...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.