The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There are many opportunities and challenges in data analytic research for TCM (Traditional Chinese Medicine) in advent of big data era, like various clinical record sources, different symptom descriptions, lots of collected clinical symptoms, more than one syndrome attached to one clinical record and etc. Novel methods on support vector machines, ensemble learning, feature selection, multi-label learning...
NeuroinformaticsNatural Language Processing (NeuroNLP) relies on clustering and classification for information categorization of biologically relevant extraction targets and for interconnections to knowledge-related patterns in event and text mined datasets. The accuracy of machine learning algorithms depended on quality of text-mined data while efficacy relied on the context of the choice of techniques...
Imbalanced dataset is occurred due to uneven distribution of data available in the real world such as disposition of complaints on government offices in Bandung. Consequently, multi-label text categorization algorithms may not produce the best performance because classifiers tend to be weighed down by the majority of the data and ignore the minority. In this paper, Bagging and Adaptive Boosting algorithms...
Short text is a popular text form, which is widely used in short commentary, micro-blog and many other fields. With the development of the social software and movie websites, the size of data is also becoming larger and larger. Most data is useless for us while other data is important for us. Therefore, it is very necessary for us to extract the useful short text from the big data. However, there...
With the rapid development of network information technology, the text is as a basic information carrier and begins to present exponential growth. The existing text classification methods haven't got information from the vast amounts of information resources timely and accurately. In order to solve the problem, the paper puts forward a new method about text categorization. It is a KNN algorithm based...
Under the background of today's information age, micro blog obtains a rapid development. With the news on the micro blog updating, in order to avoid the users getting lost in the ocean of information, emotion analysis of the information becomes urgent and important. This paper based on the implementation of micro blog emotion mining of Bayesian classifier and SVM classification algorithm, making comparison...
This study proposes a method that classifies Chinese social network positive-negative comments (Weibo) using naive Bayes algorithm trained from English social network (Twitter) corpus. We train our text classifier using Twitter corpus (in English language), and use this classifier to classify Chinese text. In the previous research, Chinese sentences are processed using Chinese word segmentation algorithms...
Text Categorization plays an important role in the fields of information retrieval, machine learning, natural language processing, data mining and others. With the development of computer and information technology, there have been many classification algorithms. Each text classification algorithms will get result at differing speeds and efficiency due to the various feature of test text. It has been...
In recent years, the research on text classification algorithm is still a hot topic in text mining. The KNN is a classic text classification algorithm. The rule of finding the nearest neighbors directly affects the performance and precision of categorization. In this paper, we mainly focus on distance measure and similarity. We propose a new text classification algorithm which combines KNN and Choquet...
Text classification is the foundation and core of text mining. Naive Bayes is an effective method for text classification. This paper improves the accuracy of Naive Bayes classification using improved information gain, one of methods of feature extraction, by reducing the impact of low-frequency words. In this paper, we use a widely corpus of NLTK. According to the test results, The accuracy of the...
Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes and...
Classification is the grouping of information or objects in predefined labeled categories based on similarities. Exponential growth rates of scientific document collection leads to unmanageable manual classification. Feature extraction is the central prerequisite of automatic document classification. TF-IDF (term frequency-inverse document frequency) is commonly used to express the text feature weight...
Chinese text classification is always challenging, especially when data are high dimensional and sparse. In this paper, we are interested in the way of text representation and dimension reduction in Chinese text classification. First, we introduces a topic model — Latent Dirichlet Allocation(LDA), which is uses LDA model as a dimension reduction method. Second, we choose Support Vector Machine(SVM)...
The k-NN is one of the most popular and easy in implementation algorithm to classify the data. The best thing about k-NN is that it accepts changes with improved version. Despite many advantages of the k-NN, it is also facing many issues. These issues are: distance/similarity calculation complexity, training dataset complexity at classification phase, proper selection of k, and get duplicate values...
Wikipedia is considered to be one of the most famous online encyclopedias. We study the issues related to trending articles on Arabic Wikipedia and how it is influenced by certain external stimulants: for example, breaking news, celebrities' tweets, special events from the past, instant messages on any social media application or any other reasons that could affect the Arabic Wikipedia articles in...
Primarily, the need for automatic text categorization and medical diagnosis was the start of Multi-label classification. Multi-label classification received a great attention and used in several real world applications The demand of its applications increased to cover additional fields like functional genomics, music, biology, scene, video etc. For example, a text document may belong to many subjects...
Recommendation systems are tools in e-commerce websites which helps user to find the most suitable products. From the huge number of books, it is really difficult to choose a particular book. So, the recommendation system technique plays very important role and helps user to get books according to their need and interest. This paper presents online book recommendation system for users who purchase...
The objective of the present work is to design a HADOOP based parallel Marathi content retrieval system using clustering technique to get the efficient and optimized result than existing systems. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history and using time session mechanism for re ranking...
Text categorization plays an important role in applications where information is filtered, monitored, personalized, categorized, organized or searched. Feature selection remains as an effective and efficient technique in text categorization. Traditional feature selections ignored the effects of unbalanced categories and the distribution of a term in different categories. On this basis, we improved...
Text feature acquiring is the key to construct the classifier to classify the text, According to the problem that the text dimension of the original feature vector is reduced and accurate, put forward a text feature acquiring algorithm based on co evolution, the algorithm uses genetic algorithm optimization performance and co evolution can implement multiple population mutual evaluation and competition,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.