The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a corpus-based approach for extracting keywords from a text written in a language that has no word boundary. Based on the concept of Thai character cluster, a Thai running text is preliminarily segmented into a sequence of inseparable units, called TCCs. To enable the handling of a large-scaled
In this paper we develop an approach to automatic, data-driven generation of pronunciation dictionaries for keyword spotting(KWS) systems. In practical applications, KWS tasks often have to deal with keywords whose pronunciations can not be found in the dictionary. To solve this problem, we study how to derive
propose MARK, a keyword-based framework for semi-automated review analysis. MARK allows an analyst describing his interests in one or some mobile apps by a set of keywords. It then finds and lists the reviews most relevant to those keywords for further analysis. It can also draw the trends over time of those keywords and
Searching published papers is a required activity for the researching process. Since articles are presented in various languages, it makes precise queries hard to achieve. In this paper, we propose an automatic theses clustering method based on bilingual and synonymous keyword sets which includes Chinese and English
The problem of automatically extracting the most interesting and relevant keyword phrases in a document has been studied extensively as it is crucial for a number of applications. These applications include contextual advertising, automatic text summarization, and user-centric entity detection systems. All these
In cross-language information retrieval (CLIR), the query sentence is often combined with a series of query keywords, rather than a complete natural sentence. Lack of necessary contextual syntactic information in such a query sentence makes it impossible to achieve a unique translation of the query sentence with
Text keywords at different semantic levels have different semantic representation abilities. Although words have been organized by semantic dictionaries (e.g. WordNet) with exact semantics, the dictionaries can not be constructed automatically by machine and there are still many words which are not included in the
-processing of Web search results have been extensively studied to help user effectively obtain useful information. This paper has basically three parts. First part is the review study on how the keyword is expanded through truncation or wildcards (which is a little known feature but one of the most powerful one) by using
paper, we propose simulating an automated system (SAS) which consists of a source dictionary, a destination dictionary, and a keyword comparison method. Our preliminary work uses MEDLINE and MEDICINENET as two vocabularies and simple comparison and Levenstein Distance as two keyword comparison methods.
neighbor search of videos from Internet. The fundamental problem lies on the scalability of a search technique, in face of the intractable volume of videos which keep rolling on the Web. In this paper, we investigate scalability of several well-known features including color signature and visual keywords for Web-based
This paper presents an integrated approach to automatically provide an overview of content on Thai websites based on tag cloud. This approach is intended to address the information overload issue by presenting the overview to users in order that they could assess whether the information meets their needs. The approach has incorporated Web content extraction, Thai word segmentation, and information...
With the rapid development of World Wide Web, the Web malicious attackers have taken the initiative jamming in Chinese to transform the form of the key words to be avoided being mined by the software existed. So how to filter the unhealthy Web page quickly and effectively has become the main content of the Web security. Because of the limitation of traditional rigid strings matching on the key words...
context information and semantic similarity together. We searched a series of context structures for keywords in a sentence. Experiment has been carried out to show the effectiveness of our method.
Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research
in both Thai and English is built for helping users from a lot of keywords of the same term and (3) a set of keywords from herbal usages can be combined with the name keyword. From the results, information collected from KUIHerb is useful for searching.
Nowadays the famous search engine companies are all providing the keyword web search capabilities. No one provides the high accurate & efficient user-requirements-oriented information Services. The task-focused massive multi-source heterogeneous information sharing & utilizing method and system is introduced
to external hierarchical resource to polish accuracy of text matching. Also, a whole framework of text processing, keyword extraction and information matching is applied firstly among Chinese SMEs complementarity identification. By using machine learning algorithm, complementarities are digitalized and potential
keyword specified by the investigator or suggested by system. Experiments were conducted on dummy crime dataset to test the accuracy and the scalability of the proposed system. Experimental results proved that subject suggestion improved the accuracy and thus speeds up the process of searching the evidence.
supervised machine learning algorithms, which is required abundant expensive labeled patent data. Due to lack of enough labeled Chinese patent data, this paper adopts a semi-supervised machine learning method named co-training, which starts from a little labeled data. This method cooperates keyword extraction with list
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.