The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the advent of communication media, a large upheaval in the volume of text messages has been observed in recent years. Messages are exchanged by mobile phone users to facilitate the speedy exchange of information. However, some messages are not solicited in every situation. Delivery of irrelevant/spam messages in a particular scenario often leads to the frustration of users. Also, it is of utmost...
The processing and analysis of the customer reviews have been received increasing attention recently. Filtering the noise and useless sentences from the reviews is the first step for this work. In this paper a new informative review identification method is proposed based on dependency parsing and sentiment analysis. These two linguistic concepts can be used to extract effective features from the...
SMS (Short Message Service) is still the primary choice as a communication medium even though nowadays mobile phone is growing with a variety of communication media messenger applications. However, nowadays along with the SMS tariff reduction leads to the increase of SMS spam, as used by some people as an alternative to advertise and fraud. Therefore, it becomes an important issue as it can bug and...
With the fast-paced prevalence of smartphones, binary short text classification (STC) is becoming a basic and challenging issue, and relevant STC algorithms can be successfully used in spam filtering for short message service (SMS), wechat, microblogging, and so on. In this manuscript, we address the structural feature of SMS documents and propose a structural learning framework, which decomposes...
This paper proposes an approach using MapReduce-based Rocchio relevance feedback algorithm, which improved the traditional Rocchio algorithm in the MapReduce paradigm, to resolve the problem of massive information filtering. Traditional text classification algorithms have vital impact on information filtering.
Short message is one of the most common communication media for mobile subscribers, so major mobile operators are devoted to improve their Short Message Service (SMS). However, the annoying and undesired messages, also named message spam or simply spam, not only worsen the users' experience, but also cause their complaints on SMS. In this paper, we present a novel Chinese SMS spam filtering framework...
Spam has been a serious and annoying problem for decades. Even though plenty of solutions have been put forward, there still remains a lot to be promoted in filtering spam emails more efficiently. Nowadays a major problem in spam filtering as well as text classification in natural language processing is the huge size of vector space due to the numerous feature terms, which is usually the cause of...
Nowadays, we have a standard communicating source i.e. On-line Social Networks which are used to share, link and broadcast the substantial quantity of the data of human being's life. Constant and day-to-day communication infers the interchange of numerous kinds of information, comprising image, unrestricted text, video and audial information. A chief portion of social network content is established...
Image based spam is a recent trick developed by the spammers' community with the intention of bypassing the successful text based spam filters. Most of the traditional text based filters have been based on Naïve Bayes classification combined with text categorization methods. This work concentrates in developing a spam filtering system that accurately blocks image spam. The system analyzes images sent...
This paper proposes a software-hardware combined efficient text classification method, which uses Xilinx IP core CAM to achieve high-speed for searching terms, utilizes features of CAM designed an efficient algorithm that merger a two-step of term extraction for text which should be classified and eliminate redundant terms into one-step. This method can effectively solve the slow speed problem which...
With the rapid development of mobile SMS (short message service), spam messages have grown explosively which trouble our daily lives seriously and lead to the loss of telecom operators. In this paper, an online spam filter based on the analysis of two criteria of content representations and relationship between the senders and receivers in social network is proposed. A Naïve Bayesian classifier is...
Text clustering, an important part of the machine learning and pattern recognition, has extensive applications in the field of natural language processing. In this paper, a method is given to improve the classic TFIDF algorithm on its shortcomings. This paper classifies the text through Naive Bayesian classifier. And uses the iterative algorithm to optimize the selection of feature words, and then...
In this study, a novel “SMS spam message filter” utilizing effective feature selection and pattern classification techniques is proposed. The proposed filter detects and filters out SMS spam messages in a smart manner rather than black/white list approaches that require intervention of phone users. In the study, Gini index based approach is preferred as the feature selection method. The feature vectors...
This is the preliminary work for a project which will be filtering comments made on news and papers automatically. Our database has over 1 million news and comments. Due to the intensity of our data, 30.677 comments made on 15.064 articles on 44 different categories are used as experimental data. Proposed anomaly based method have been obtained fast and high accuracy results without the high storage...
In Taiwan, the famous bloggers can be regard as professional writers now. More and more people subscribe their RSS (Really Simple Syndication) to receive updated information. But readers might only interest in few categories of articles, readers need to filter other articles by themselves. In order to help people select the information they want, this research proposed a two-layer SVM classification...
Preprocessing is an important task and critical step in information retrieval and text mining. The objective of this study is to analyze the effect of preprocessing methods in text classification on Turkish texts. We compiled two large datasets from Turkish newspapers using a crawler. On these compiled data sets and using two additional datasets, we perform a detailed analysis of preprocessing methods...
Many learners have found that it is difficult to complete their web-based learning plan. Once on the Internet, they can't help browsing more interesting Web pages instead of continuing to do their learning tasks. This situation we called Information Trek. To solve this problem, this study proposes an learning detection system which can discover whether the contents of a web page a student viewing...
Current search engines are not very effective in filtering out harmful information since the technology they use for filtering is often based on traditional text classification in which texts are often classified according to feature words. To improve the effectiveness of filtering, in this paper, we propose a new filtering scheme in which we combine the neural network and ontology categorization...
In the age of Web 2.0 people organize large collections of web pages, articles, or emails in hierarchies of topics, or arrange a large body of knowledge in ontologies. This scenario requires automatic text categorization systems able to cope with underlying taxonomies in an effective and efficient way, so that information overload and input imbalance can be suitably dealt with. In this work, we propose...
The explosive growth of the Internet inevitably leads to the proliferation of harmful information such as pornography, drug and violence. A great deal of filtering techniques based on image and text categorization is proposed in the literature. Among them, text-based filtering plays a leading role for its good performance. Existing text filtering algorithms can be seen as a classical text categorization...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.