The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Traditionally, page images undergo pre-processing before the later stages of document analysis are applied. One common pre-processing step is to calculate and correct for the presence of simple page skew through a compensating rotation. Such operations modify the original input image, however, and in doing so may discard or obscure useful information. In this paper, we examine the impact of page deskewing...
Hidden Markov Support Vector Machines is a novel structural SVMs model. Its efficiency has been proved in label sequence learning task such as English text chunking. In this paper, we treat Chinese chunk recognition as a label sequence learning problem. After giving the definition of Chinese chunk, we apply HMSVM to solve Chinese chunk problem. The results of experiment show that it achieves a better...
Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired. This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition. Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature sets, which involves in structural and...
This paper has put forward a new method to improve the performance of text categorization. The new method combines HMM (Hidden Markov Model) and SVM (Support Vector Machines). HMMs are used to as a feature extractor and then a new feature vector is normalized as the input of SVMs, so the trained SVMs can classify unknown texts successfully. The experimental results prove that the method is more effective...
Traditional text chunking approach is to identify many phrases using only one model, and the same features are used to identify these phrases too. So the helpful features of each phrase are ignored. In fact, different phrases have different helpful features. In this paper, the concept of ??sensitive feature?? is proposed, and the sensitive features of eleven English types and seven Chinese types of...
We present results of an experiment dealing with combining outputs of five part-of-speech taggers via tagger voting in order to improve the overall accuracy of morphosyntactic tagging of Croatian texts using a subset of the Multext-East v3 tagset. The increase in accuracy over the best-performing single tagger is shown to exist, but not to be statistically significant. We discuss the performance of...
Text mining tools and algorithms are being successfully used for information extraction especially on large corpus like biomedical publications. These tools not only aid in information extraction but also in forming new theories and relationships between various fields of biomedical research. Extraction of gene-gene or gene-disease relationship is one such application. In this paper, we introduce...
In this paper, we present a novel approach for incorporating structural information into the hidden Markov modeling (HMM) framework for offline handwriting recognition. Traditionally, structural features have been used in recognition approaches that rely on accurate segmentation of words into smaller units (sub-words or characters). However, such segmentation based approaches do not perform well on...
As much valuable domain knowledge is hidden in enterprises' text repositories (e.g., email archives, digital libraries, etc.), it is desirable to develop effective knowledge management tools to process this unstructured data so as to extract domain knowledge for business decision making. Ontology-based semantic annotation of documents is one of the promising ways for knowledge discovery from text...
Transforming handwriting into digital text and recognition of handwritten patterns opens a vast scope of application opportunities from searching for handwritten notes and document management to causing actions by writing symbols. Despite receiving a great attention, a massive number of applications, and a huge research effort, recognition of handwritten text has not still reached a desired efficiency...
This paper presents a new approach to estimating the readability of handwritten text. The estimation task is posed as a regression problem. A novel support vector regression (SVR) system is used to estimate the recognition rate of a text recognizer on a given text. The estimated recognition rates are used to classify text as either readable or unreadable. Unreadable text can then be filtered out prior...
Information extraction (IE) aims to extract from textual documents only the fragments which correspond to datafields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and hidden Markov models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM,...
Recognizing and extracting exact name entities, like Persons, Locations, Organizations, Dates and Times are very useful to mining information from electronics resources and text. Learning to extract these types of data is called Named Entity Recognition (NER) task. Proper named entity recognition and extraction is important to solve most problems in hot research area such as Question Answering and...
Text classification has been considered as a hot research area in data mining. This paper presents a new approach combining hidden Markov model (HMM) with support vector machine (SVM) for text classification. HMMs are used to as a feature extractor and then a new feature vector is normalized as the input of SVMs, so the trained SVMs can classify unknown texts successfully. The experimental results...
The currently similarity computation methods of Chinese sentence and their shortcomings are analyzed at first. According to the characteristic of the Chinese question sentence, Chinese question general chunk and special chunk are defined, and then a similarity computation method of Chinese question based on chunk is proposed. In this method, the semantic similarity of words is computed on the basis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.