The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Document image binarization is a central problem in many document analysis systems. Indeed, it represents one of the basic challenges, especially in case of historical documents analysis. In this paper, we propose a novel robust multi stage framework that combines different existing document image thresholding methods for the purpose of getting a better binarization result. CLAHE technique is introduced...
Skew detection is a crucial step for document analysis systems. Indeed, it represents one of the basic challenges, especially in case of historical documents analysis. In this paper, we propose a novel robust skew angle detection and correction technique. Morphological Skeleton is introduced to significantly reduce the amount of data to treat by removing the redundant pixels and keeping only the central...
Social network analysis is an important set of techniques that are used in many different areas. One such area is intelligence and law enforcement where social network analysis is used to study various kinds of networks. One of the problems with social networks that are extracted from social media is that easily becomes very large and as a consequence difficult to analyze. Therefore, there is a need...
The availability of large corpora of online software-related documents today presents an opportunity to use machine learning to improve integrated development environments by first automatically collecting code examples along with associated descriptions. Digital libraries of computer science research and education conference and journal articles can be a rich source for code examples that are used...
Violent lone offenders such as school shooters and lone actor terrorists pose a threat to the modern society but since they act alone or with minimal help form others they are very difficult to detect. Previous research has shown that violent lone offenders show signs of certain psychological warning behaviors that can be viewed as indicators of an increasing or accelerating risk of committing targeted...
Nowadays, communication between people is mediated by technology and more specifically via Internet either by using email or social networking sites. Since any online activity generates an electronic trace, creating an automated tool to collect and analyze the communication between people can be valuable for extracting useful information about their behavioral characteristics. Combining these characteristics,...
The objective of this study is to examine the status of Higher Education Institutions (HEIs) policies in supporting lecturers when providing mobile centric services to students. The research was undertaken as a single case study within the Open and Distance Learning (ODL) context in South Africa. Qualitative data was captured through policy document analysis using the Framework for Qualitative Data...
In this work we study information leakage through discussions in online social networks. In particular, we focus on articles published by news pages, in which a person's name is censored, and we examine whether the person is identifiable (de-censored) by analyzing comments and social network graphs of commenters. As a case study for our proposed methodology, in this paper we considered 48 articles...
Due to heavy use of electronics devices nowadays most of the information is available in electronic format and a substantial portion of information is stored as text such as in news articles, technical papers, books, digital libraries, email messages, blogs, and web pages. Mining the knowledge like pattern finding or clustering of similar kind of words is one of the important issues nowadays. This...
Table understanding is a well studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images and born-digital, object-based formats such as PDF. Despite the abundance of these techniques, an objective comparison of their performance is still missing. The Table Competition...
With the rise of email communication, enterprises strive to manage incoming documents from all input channels for achieving customer satisfaction. Their overall goal is to reduce request processing time and to increase processing quality. Previously, we proposed the approach of process-driven document analysis (DA) using the concepts of Attentive Tasks (ATs) and the Specialist Board (SB). The ATs...
Many enterprises strive toward the integration of input communication channels into their internal business processes. To help them, we propose to drive input channel document analysis (DA) by formalizing information expectations from current process instances in Attentive Task (AT) templates. This requires, however, to map incoming request documents to the related AT from a set of ATs. For this purpose,...
This article describes our analytic process and experience of using the Jigsaw system in working on the VAST 2011 Mini Challenge 3. We describe how we extracted and worked with entities from the documents, and how Jigsaw's computational analysis capabilities and visualizations scaffolded the investigation. Based on our experiences, we discuss desirable features that would enhance the analytic power...
This text analyses the present management of small and medium real estate enterprise, puts forth that green business process management is key guarantee of low-carbon age of real estate enterprise, amplifies closed-loop green business process management system of taking new technology and new material as catalyst, regarding customer as the centre, taking low-carbon as object.
Developing photovoltaic industry is not only an inevitable choice for China which responds to the trend of new resources, but also an objective necessary for optimizing resource structure, fostering new industry and facilitating industry transferring. This text analyses the current situation and dilemma of China's photovoltaic industry, and advances the industry development strategy.
Through research on the calculation method of feature words' weight in texts and semantic similarity between words, we proposed a calculation method of feature words' weight based on concept weight for the semantic association phenomenon of text features and the prevalence of high-dimensional problem in a text vector space model. This method reduces the semantic loss of the feature set and the dimension...
Automatic titling (i.e. providing titles) is one of key domains of Web site accessibility. This paper provides an approach allowing the automatic titling of texts (e.g. emails, fora, etc.) relying on the morphosyntactic study of human written titles in a corpus of various texts. The method is developed in four stages: Corpus acquisition, candidate sentences determination for titling, noun phrase extraction...
Anaphora resolution (AR) is a process to identify the appropriate antecedent with its anaphor which occur before the anaphor. AR able to improve most of the NLP applications such as question answering, short answer examination system and information extraction. Most of AR systems are deal with English language. Thus, in 1990's the research on AR has been applied for other language, such as Arabic,...
Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on fast Fourier...
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.