The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose the first deep reinforce-ment learning framework to estimate the optimal Dynamic Treat-ment Regimes from observational medical data. This framework is more flexible and adaptive for high dimensional action and state spaces than existing reinforcement learning methods to model real life complexity in heterogeneous disease progression and treatment choices, with the goal to...
Email spam filtering is considered as an online supervised learning task for binary text classification (TC). Normally, the previous statistical TC algorithms treat an email as a single plain-text document, ignoring the multi-field feature of email documents. This paper investigates the multi-field feature, and proposes a multi-field learning (MFL) approach for email spam filtering. The MFL approach...
Single term based document representations, e.g. bag-of-words, have been widely accepted in the machine learning, information retrieval and text mining community. One notable limitation of such methods is that they do not consider the rich information resident in the semantic relations among terms. This paper reports our approach of concepts handling in document representation and its effect on the...
Chinese word segmentation ambiguity can be divided into two categories: overlapped ambiguity and combinational ambiguity. This paper only focuses on the resolution to combinational ambiguity of Chinese word segmentation. We select 36 typical combinational ambiguity strings, and make use of transformation-based learning methods to learn the rules of combinational ambiguity. Using these rules to test...
To obtain the inherent laws from large amounts of data records in retail industry and to provide valuable information for retailers, this paper presents a neural-network-based forecasting algorithm, which adopts Holt-Winters' model and a neural network. Different from traditional forecasting algorithms, this algorithm rearranges Holt-Winters model, and builds a neural network on it. Furthermore, it...
In content-based image retrieval, the ldquosemantic gaprdquo between visual image features and user semantics makes it hard to predict abstract image categories from low-level features. We present a hybrid system that integrates global features (G-features) and region features (R-features) for predicting image semantics. As an intermediary between image features and categories, we introduce the notion...
A good concept drifting stream classifier should have the following two characteristics: 1) sensitive to the new concept when concept drifts; 2) have stable high accuracy when concept is stable. Most published methods and algorithms may succeed in one aspect while neglecting the other. In this paper, we proposed an adaptive ensemble classifier for concept drifting stream classification which focuses...
This paper introduces support vector machine classifiers into entering tone recognition. Not every syllable needs recognition in a statistical way. The recognition accuracy of syllables which need recognition in the support vector machine approach is about 90%, which makes it possible to analyze poems' rhymes and translate Mandarin into many Chinese dialects. The experiments also check the influence...
Based on text chunking using HMM, transformation-based learning is made use of to improve the precision of chunk tags further. The training data and the test data are from Penn treebank 4.0, and 13 text chunks are used. Rules are learned automatically according to the rule templates. The precision is improved 4.48%. The detailed analysis that affects the text chunking is given. Different threshold,...
Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade...
In real-world applications the number of examples in one class may overwhelm the other class, but the primary interest is usually on the minor class. Cost-sensitive learning has been deeded as a good solution to these class-imbalanced tasks, yet it is not clear how does the class-imbalance affect cost-sensitive classifiers. This paper presents an empirical study using 38 data sets, which discloses...
Text chunking is an effective method to decrease the difficulty of natural language parsing. In this paper, a statistical method based on hidden Markov model (HMM) is used for Chinese text chunking. Moreover, a transformation based error-driven learning approach is adopted to improve the performance. The definition of transformation rule templates is the key problem of this machine learning approach...
After an examination of a Chinese-English bilingual corpus with 2239 sentence pairs, a new definition of Chinese noun phrase (NP), quasi-equivalent noun phrase (equNP), is proposed with a goal of translation from Chinese NPs to English NPs. Firstly, all the equNPs in the corpus are tagged manually according to the definition in this paper. A set of part of speech (POS) templates for equNP is automatically...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.