The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In recent years, ontologies as a semantic knowledge representation become widely used in many information systems. Manual creation of ontologies by domain experts and ontology developers is also a costly task, time consuming and needs extra efforts. Learning Non-Taxonomic Relationships is a subfield of ontology learning which targets automatic extraction of non-taxonomic relationships from input,...
Automatic multi-document summarization may help news readers retrieve information from digital news media efficiently. The summarizer create a concise summary containing important information from a collection of articles, enabling readers to read only one text to gain information from multiple text sources. Reflecting on previous researches, we propose an automatic summarization system using sentence...
With the rapid development of Internet, how to obtain valuable information from massive messages has become a major problem we need to be solved in the information explosive era. This paper introduces the development route of information extraction technology, and discusses four categories of Chinese entity relation extraction technologies in depth. Finally, the advantages and disadvantages of different...
A Chinese resume information extraction system (CRIES) based on semi-structured text is designed and implemented to obtain formatted information by extracting text content of every field from resumes in different formats and update information automatically based on the web. Firstly, ideas to classify resumes, some constraints obtained by analyzing resume features and overall extraction strategy is...
We describe efforts to bring new methods of search analytics, machine learning, natural language processing and data visualization to address the challenge of finding and extracting meaning from unstructured text and multimedia content. We use the Polar domain to motivate the problem and our proposed solution. However our techniques are applicable and scalable to other domains.
In an academic environment, plagiarism is the process of copying someone else's text, idea or data verbatim or without due recognition of the source, which is a serious academic offence. Many techniques have been proposed in the literature for detecting plagiarism in texts, but only a few techniques exist for detecting figure plagiarism. The main problem associated with existing techniques of plagiarism...
As the amount of documents continues to increase steadily, it has become an important issue to shorten processing time in the field of natural language processing. In this paper, we describe a method to reduce the execution speed of the Korean temporal information extraction module from a development perspective. While the rule-based approach is useful for finding time representations from natural...
Classification is a central problem in the fields of data mining and machine learning. Using a training set of labeled instances, the task is to build a model (classifier) that can be used to predict the class of new unlabelled instances. Data preparation is crucial to the data mining process, and its focus is to improve the fitness of the training data for the learning algorithms to produce more...
Relational aggregated search (RAS) is defined as a complementary set of approaches where relations between information nuggets are taken into account. From this viewpoint, the relational aggregated search should retrieve information nuggets and their relations, which are to be used to coherently assemble the final search result. Traditional approaches used for RAS are based on Information Extraction...
The core technology of Remote sensing mineralizing detection lies in extracting the information of ore-forming structure and altered mineral. The key of problem is comprehensive analysis and application of remote sensing image spectral information. Regarding Xinjiang Taxkorgan prefecture as the study area and choosing Multispectral ASTER remote sensing image data as data source, which can statistically...
Event extraction is an important research point in information extraction area, and news event extraction has a greater practical significance. The existing methods of extracting news event, which starts from the time element, is to identify the date sentences hold by Natural Language Processing and extract the event on the date by Text Clustering. However, they only process the news that holds a...
Text extraction is a crucial stage of analyzing Journal papers. Journal papers generally are in PDF format which is semi structured data. Journal papers are presented into different sections like Introduction, Methodology, Experimental setup, Result and analysis etc. so that it is easy to access information from any section as per the reader's interest. The main importance on section extraction is...
Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained...
Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction...
Scene text information extraction plays an important role in many computer vision applications. Most features in existing text extraction algorithms are only applicable to one text extraction stage (text detection or recognition), which significantly weakens the consistency in an end-to-end system, especially for the complex Chinese texts. To tackle this challenging problem, we propose a novel text...
Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content...
Scene text information extraction plays an important role in many computer vision applications. Unlike most existing text extraction algorithms for English texts, in this paper, we focus on Chinese texts, which are more complex in stroke and structure. To tackle this challenging problem, we propose a novel convolutional neural network (CNN) based text structure feature extractor for Chinese texts...
Due to the increasing amount of information in web-based environment, analysts nowadays need information extracted from different sources. Extracting this information to guide decision making in a national security perspective remains a challenging task. The major issue arises due to a large amount of irrelevant information or complexity of unstructured data which makes information extraction and...
The Treatise on Invertebrate Paleontology is the most reliable information source of invertebrate paleontology research. Based on this Treatise, an Invertebrate Paleontology Knowledgebase (IPKB) has been built as a digital library to provide these data through a web interface. However, the search functions provided by the old IPKB system are only based on textual information, while some more important...
The implementation of electronic medical records (EMRs) produces a huge amount of unstructured clinical text. This domain-specific clinical text has opened a stage for temporal information extraction (TIE) due to its significance of exploitation in medical care and richness of temporality. Processing temporal information in clinical text is much more difficult in comparison to newswire text due to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.