The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work presents an experimental study towards modeling the readerspsila emotional state as a response to font and typesetting elements of documents presented on a LCD display. Any content and/or domain dependent information was excluded from the document that was tested. An automated computer-based experimental procedure has been followed based on the paper-and-pencil self assessment Manikin test...
Document representation is one of the crucial components that determine the effectiveness of text classification tasks. Traditional document representation approaches typically adopt a popular bag-of-word method as the underlying document representation. Although itpsilas a simple and efficient method, the major shortcoming of bag-of-word representation is in the independent of word feature assumption...
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs...
In this paper, an adaptive edge-based text detection approach in images and video frames is proposed. The proposed approach can adopt different edge detection methods according to the image background complexity. It mainly consists of four stages: Firstly, images are classified into different background complexities. Secondly, different edge detectors are applied on the images according to their background...
In this paper we describe a model for classifying binary data using classifiers based on Bernoulli mixture models. We show how Bernoulli mixtures can be used for feature extraction and dimensionality reduction of raw input data. The extracted features are then used for training a classifier for supervised labeling of individual sample points. We have applied this method to two different types of datasets,...
The document similarity measure is a key point in textual data processing. It is the main responsible of the performance of a processing system. Since a decade, kernels are used as similarity functions within inner-product based algorithms such as the SVM for NLP problems and especially for text categorization. In this paper, we present a semantic space constructed from latent concepts. The concepts...
In this paper, we propose a method for distinction between handwritten and machine-printed characters with no need to locate positions of characters or text lines. We call the proposed method psilaspectrum-based local fluctuation detection method. The method transforms local regions in document images into power spectrum to extract feature values which represent fluctuations caused by handwriting...
In the paper, we present an approach of image processing analysis to extract flowchart information from digital imagery. Firstly, flowchart imagery is processed to extract the text components and then extract the geometrical shapes components. We analyze text, and various geometrical shapes present in flowchart and carry out a variety of processes such as image segmentation, shape description, text...
This paper presents a novel segmentation algorithm for offline cursive handwriting recognition. An over-segmentation algorithm is introduced to dissect the words from handwritten text based on the pixel density between upper and lower baselines. Each segment from the over-segmentation is passed to a multiple expert-based validation process. First expert compares the total foreground pixel of the segmentation...
Interest and prior knowledge are supposed to influence reading comprehension and learning from natural language texts. The effects of interest have been well studied in the literature, but little effort has been made on empirically establishing the influences of prior knowledge in reading attention and engagement, and therefore in comprehension and learning. A quantitative characterization of this...
Document classification uses different types of word weightings as features for representation of documents. In our findings we find the class document frequency, dfc, of a word is the most important feature in document classification. Machine learning algorithms trained with dfc of words show similar performance in terms of correct classification of test documents when compared to more complicated...
In this paper, we present an artificial neural network (ANN) architecture for segmenting unconstrained handwritten sentences in the English language into single words. Feature extraction is performed on a line of text to feed an ANN that classifies each column image as belonging to a word or gap between words. Thus, a sequence of columns of the same class represents words and inter-word gaps. Through...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.