The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a multi-pattern matching algorithm-APT(Anti-plagiarism Trie) algorithm for Chinese-English mixed text based on text anti-plagiarism detector. The APT algorithm accepts the structure idea of the multi-pattern matching algorithm with Absolute Hash Trie tree, uses method of similarity measurement in the string matching, combines the strategy of skip characters and adding condition...
This paper proposes a measurement based on Minimum Edit Distance (MED) to the similarity between two sets of MultiWord Expressions (MWEs), which we use to calculate matching degree between two documents. We test the matching algorithm in the position searching system. Experiments show that the new measurement has higher performance than the cosine distance.
Copyright protection and authentication of digital content has become a major concern in the current digital era. Plain text is the widely used means of information exchange on the Internet and it is essential to verify the authenticity of information in any form of communication. There are very limited techniques available for plain text watermarking, authentication, and tamper detection. This paper...
Word matching problem is to find all the occurrences of a pattern P[0...m-1] in the text T[0...n-1], where P neither contains any white space nor preceded and followed by space. In the multi-patterns word matching problem, all the occurrences of multiple word P0, P1, P2 ...Pr-1, (rges1) in the given text T are to be reported. In the present discussion, we assume that all the patterns have equal size...
Chinese words segmentation is an important technique for Chinese Web data mining. After the research made on some Chinese word segmentation nowadays, an improved algorithm is proposed in this paper. The algorithm updates dictionary by using two-way Markov chain, and does word segmentation by applying an improved forward maximum matching method based on word frequency statistic. The simulation shows...
This article aims to solve the problem of extracting the cultural terms and their correspondent English translations from the heterogeneous literature of the translation of the ancient Chinese classics. As the tool of text processing, regular expressions can help to realize the matching in the patterned text. This research focuses on design the target-oriented regular expressions to fit the pattern...
The task of definition extraction aims to acquire definitions of terms from texts. This task is a subtask of terminology extraction, ontology construction, semantic relation learning, and question answering and so on. This paper presents a bootstrapping approach to automatic extracting definitions of domain-specific terms from unannotated Chinese free texts. Experimental results in three domains of...
An important problem in text mining is the automatic extraction of semantic relations. The paper provides a domain independent method for automatic extraction of part-whole relations in Chinese corpusa. The method consists of there phases. First, a set of lexico-syntactical patterns for part-whole relations are designed using known pairs of concepts encoding part-whole relations as seeds, and manually...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.