The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Digitizing printed document is always a challenge faced by the computing society. Digitization of text not only allows users to easily modify and reprint printed documents, but also is a need of the day due to the use of word-search capability available at disposal in this era. Converting a printed document into a stream of characters using OCR (optical character recognition) techniques is a widely...
We propose a classification model for the cognitive level of question items in examinations based on Bloom's taxonomy. The model implements the artificial neural network approach, which is trained using the scaled conjugate gradient learning algorithm. Several data preprocessing techniques such as word extraction, stop word removal, stemming, and vector representation are applied to a feature set...
Word frequency analysis plays an essential role in many data mining tasks of large-scale data set based on text corpus, and hash list is a very simple but efficient structure for frequent pattern discovering. In this paper, a Poisson approximation approach is exploited to analyze the space efficiency of hash list under different parameters on probability. Based on our theoretical model, an optimal...
General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents...
This research proposes the application of NTC (neural text categorizer) for categorizing news articles. Even if the research on text categorization has been progressed very much, documents should be still encoded into numerical vectors. Encoding so causes the two main problems: huge dimensionality and sparse distribution. The idea of this research as the solution to the problems is to encode documents...
This research proposes NTSO (neural text self organizer) as the approach to text clustering and sets inverted index as the basis for execution of the NTSO. For using one of traditional approaches, documents should be encoded into numerical vectors and encoding so causes the two main problems: the huge dimensionality and the sparse distribution. This research proposes that documents should be encoded...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.