The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Knowledge is stored in an enterprise in various forms ranging from unstructured operational data, legal documents to structured information like programs, as well as relational data stored in databases to semi-structured information stored in xml files. All these information if viewed from a holistic standpoint can help an enterprise to understand and reflect upon itself and thereby make knowledgeable...
Over recent years, the world has experienced a huge growth in the volume of shared web texts. Its users generate daily a huge volume of comments and reviews related to different aspects of their lives. In general, opinion mining/sentiment analysis refers to the task of identifying positive and negative opinions, emotions and evaluations related to an article, news, products, services, etc [1]. Arabic...
Semantic relation extraction is an important part of information extraction, it has application value in the automatic question answering system, retrieval system, ontology learning, semantic web annotation, and many other areas. Pattern representation method is context pattern in previous semi-Supervised semantic relation extraction based on bootstrapping, but it did not consider the role of the...
The internet and the Web 2.0 gave rise to a wide variety of user generated content. This caused a massive growth in the amount and availability of opinionated information. This collection of complex, unstructured information is often referred as Big Data. A common practical application of such Big Data is social media sentiment analysis. The general aim of sentiment analysis is to determine/extract...
Information Retrieval is a well established interdisciplinary topic in which machine learning, computational linguistic, computer programming and data mining merge together. SLAIR stands for Sea Lab Advanced Information Retrieval and is an efficient software architecture that embeds these issues in a unique framework. SLAIR is expandable both from the data format and algorithm point of view. A pluggable...
Relation Extraction is an important research field in Information Extraction. In this paper, we present a novel mixed model to extract relation between named entities in Chinese, which combines the merits of both feature based method and tree kernel based method. Feature based method captures the language information of the text, while, the tree kernel based method shows the structured information...
Weblogs are an important source of information that requires automatic techniques to categorize them into “topic-based” content, to facilitate their future browsing and retrieval. In this paper we propose and illustrate the effectiveness of a new tf. idf measure. The proposed Conf.idf, Catf.idf measures are solely based on the mapping of terms-to-concepts-to-categories (TCONCAT) method that utilizes...
Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval tasks. Document classification, also known as document categorization, is the process of assigning a document to one or more predefined category labels. Classification...
Document classification is a key task for many text mining applications. However, traditional text classification requires labeled data to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available. In this work, we propose a universal text classifier, which does not require any labeled document. Our approach simulates the capability of people to classify documents...
Semi-supervised learning has been paid increasing attention and is widely used in many fields such as data mining, information retrieval and knowledge management as it can utilize both labeled and unlabeled data. Laplacian SVM (LapSVM) is a very classical method whose effectiveness has been validated by large number of experiments. However, LapSVM is sensitive to labeled data and it exposes to cubic...
Character recognition has been in importance for several decades. Lot of research interest are now focused on applying pattern recognition and computer vision algorithms on camera captured documents to retrieve information from the documents. This paper presents a novel approach for extracting text in camera captured images using edge based algorithm. Extensive experiments have been carried out on...
This paper researches the technology of automatic summarization, and presents a method that extracts summarization in multiple topics document,which combines statistical model with document relationship map,and uses the algorithm of sub-topic community detection. The experimental results show that this method is more efficient for summarization extraction in multiple topics document.
From a large data set of documents, we need to find documents that relate to human interesting. The relevance feedback method needs a set of relevant and non-relevant documents to work usefully. However, the initial retrieved documents, which are displayed to a user, sometimes don't include relevant documents. In order to solve this problem, we propose a new feedback method using information of non-relevant...
We need to find documents that relate to human interesting from a large data set of documents. The relevance feedback method needs a set of relevant and non-relevant documents to work usefully. However, the initial retrieved documents, which are displayed to a user, sometimes don't include relevant documents. In order to solve this problem, we propose a new feedback method using information of non-relevant...
Expert finding is the task of identifying persons with expertise on a given topic. Existing methods try to model the dependencies between candidates and terms with distance measure or sequential measure, which have been proven to be effective. However, to the best of our knowledge, no work has been conducted on the combination of the two dependencies. In this paper, we propose a language model based...
Extracting instances of a given target relation from a given Web page corpus seems to be the basic work to exploit nearly endless source of knowledge which provided by the World Wide Web. Supervised learning requires a large amount of labeled data, but the data labeling process can be expensive and time consuming. In this paper we present a kernel-based weakly supervised machine learning algorithm...
Since the Chinese Websites have increased in the explosive Internet era, making efficient information retrieval systems has become one of the major endeavors, especially in fields of Chinese recognition. In this paper, the authors study the integration of subsequence kernel function based on ontology. Using the vector space model (VSM) to create subsequence kernels, the kernel methodology described...
This paper presents the experimentation of a system, developed through the RAMPE research project, intended for the assistance and information of blind or Visual Impaired People (VIP) so that they can increase their mobility and autonomy in public transports. The system is intended to equip bus or tramway stops. It is based either on a simple remote command or on a smart hand-held devices that is...
A novel approach of the entity relation extraction is proposed by this paper, it is different from the previous approaches, and the syntactic knowledge extraction is specific section, which automatically extracts the characteristic words and patterns based on hierarchy bootstrapping machine learning. It advocates using a small amount of seed information and a large collection of easily-obtained unlabeled...
In this paper, we construct and compare several feature extraction approaches in order to find a better solution for classification of Turkish Web documents in the marketing domain. We produce our feature extraction techniques using characteristics of the Turkish language, structures of Web documents and online content in the marketing domain. We form datasets in different feature spaces and we apply...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.