The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
facility. Automating the transcription of these documents using Optical Character Recognition (OCR) systems is also challenging due to the very complex cursive nature of Urdu text. To overcome these limitations, a keyword spotting based information retrieval system for document images is introduced in this study. The proposed
Feature weighting is a technique used to approximate the optimal degree of influence of individual features. This paper presents a feature weighting method for Document Image Retrieval System (DIRS) based on keyword spotting. In this method, we weight the features using Weighted Principal Component Analysis (PCA). The
This paper presents a new way for keyword spotting in degraded imaged document. Two prevalent word indexing, OCR and word shape coding, are combined compactly based on the recognition confidence evaluation. The basic procedures are as follows. First, OCR candidates are used for OCR indexing. Second, a new stoke
The total information available on WWW (World Wide Web) is huge and is increasing at lightning speed. Existing web is dominated by Search Engines which are running on keyword based search system which in turn leads to wastage of end user's precious time if he do not know the key terms which are utilized to index
A document surrogate is usually represented in a list of words. Because not all words in a document reflect its content, it is necessary to select important words from the document that relate to its content. Such important words are called keywords and are selected with a particular equation based on Term Frequency
This paper proposes an extended vector space model (VSM), which is called M2VSM (meta keyword-based modified VSM). When conventional VSM is applied to document clustering, it is difficult to adjust the granularity of cluster in terms of topic. In order to solve the problem, M2VSM considers meta keywords such as
The quality of indexing is important for successful retrieval results. This paper describes a novel keyword extracting tool that is distinguished itself from existing ones by its efficient keyword significance measure. The measure integrates term frequency retrieval characteristics, document collection characteristics
Keyword-based search is popularized in Information Retrieval (IR) communities and Internet search engines on the Web. Nowadays, social networking, micro-blogging, and other data-driven websites store large amounts of information in relational databases. But searching in databases users need to know a database schema
Recent research has shown that keyword search is a friendly and potentially effective way to retrieve information of interest over relational databases. Existing work has generally focused on implementing keyword search in centralized databases. This paper addresses keyword search over distributed databases. We adopts
Keyword search is the dominant information discovery method in Information Retrieval (IR) systems and search engines on the Web. Nowadays, there is an increase amount of data stored in structured databases (Relational Databases). Searching on traditional database management system is done through customized
Indexing of news video streams with semantic keywords is of interest to agencies that regularly monitor many news channels. In this paper, we describe a new method for indexing news video in different languages, for which there are inadequate language tools. Our approach involves combining multimodal inputs, namely
The tool for keyword extraction developed within the AXMEDIS project have been designed for working in a multilingual environment and new algorithms have been developed to generate keywords with higher representativeness for content search and identification. The paper specifies the linguistic criteria followed for
Some popular Internet applications such as instant message, blog, twitter and Google buzz generate huge data of short text. These data can then be summarized, mined, and queried by other applications. To this end, suitable storage design with outstanding performance must be offered to address the question of real-time full text indexing and searching. This paper studies Lucene indexing and searching...
Traditionally, full text retrieval over structure peer- to-peer network has been implemented by inverted index by keywords. However, search based on this index scheme only support literally word match, not taking into account the meaning of words. In this paper, we present a new index scheme, inverted index by
needs. In this paper, we present the design, architecture and implementation of an open-source keyword-based paradigm for the search of software resources in Grid infrastructures, called Minersoft. A key goal of Minersoft is to annotate automatically all the software resources with keyword-rich metadata. Using advanced
Social applications associate a set of user defined keywords named tags when publishing social objects in order to locate them later. We propose T-DHT, a hybrid unstructured-structured DHT based approach, to cope with the high demanding requirements of social applications, in a fully scalable, distributed and balanced
huge irrelevant search hits. In this paper, we propose an improved method for ranking of search results to reduce human efforts on locating interesting hits. The search results are re-ranked using adaptive user interest hierarchies (AUIH), which considers both investigator-defined keywords and user interest learnt from
This paper presents the comparison of the text document space dimension reduction and the text document clustering and also the keyword space dimension reduction and keyword clustering by the latent semantic analysis and by the Hebbian neural network with Oja learning rule. Results of this neural network are compared
Recently, peer-to-peer systems have become one of the most popular distributed applications. Many previous works have investigated identifier-based indexing systems that support a query-by-identifier service. However, clients usually have only partial information about an object, and prefer to query by keywords. In
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.