The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the large number of Web sites promoting the use of illicit drugs, it has become important to screen these sites for the protection of children on the Internet. Conventional keyword-based approaches are not sufficient because these Web sites often have lots of images and little meaningful words than prices. We
We study the problem of learning to rank images for image retrieval. For a noisy set of images indexed or tagged by the same keyword, we learn a ranking model from some training examples and then use the learned model to rank new images. Unlike previous work on image retrieval, which usually coarsely divide the images
We consider the problem of privately searching for sensitive or classified file signatures on an untrusted server. Inspired by the private stream searching system of Ostrovsky and Skeith, we propose a new scheme optimized for matching individual file signatures (versus keyword matching in documents). Our optimization
special data record and new record model on fuzzy set are given. By calculating the membership of keyword, new fuzzy closeness functions are proposed to classify the information. Finally, examples prove that this algorithm can effectively and automatically classify input information of database, the accuracy and intelligence
method on the basis of the skew detection. Then use OCR keywords recognition technology to classify spam faxes once again. This method is simple to implement with high accuracy, and it has applied successfully.
In this paper, a new method for question classification is proposed, which employs ensemble learning algorithms to train multiple question classifiers. These component learners are combined to produce the final hypothesis. In detail, the feature spaces are obtained through extracting high-frequency keywords from
Spatial Co-location patterns are similar to association rules but explore more relying spatial auto-correlation. They represent subsets of Boolean spatial features whose instances are often located in close geographic proximity. Existing co-location patterns mining researches only concern the spatial attributes, and few of them can handle the huge amount of non-spatial attributes in spatial datasets...
This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...
Abstract-By analyzing the process of classification and MapReduce computing paradigms, it is found that the parallel and distributed computing model in MapReduce is appropriate for constructing classifier model. This paper presents a MapReduce algorithm for parallel and distributed classification, aiming to reduce the computational time in training process on large scale documents. Our experiment...
Computer forensics is simply applies the computer investigation and analysis technique to the evidence of potential and the legal effect to determination and gain. It mainly includes the process of data access, data analysis, data submitted and so on. And the data analysis is the key link of computer forensics. It is faced with a question that we must extract useful information from the magnanimous...
It is well known that the work condition of pipeline, the leak included, can be identified by a pressure signal analysis. Because of the high frequency data collection and always on-line pipeline leak detection, the pressure signal brings up massive data. A methodology for pipeline leak detection using data mining technology and work condition identification is presented here. Sixteen groups of raw...
the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. So this paper present a algorithm of Chinese text classification on semantic Web. After getting keywords from the Web text, we get rid of ambiguity of the keywords. Then we get the semantic
Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following
favorite restaurant. The sentiment analysis for restaurant rating system rates the restaurant depending upon the reviews given by the users. The system breaks user comments to check for sentiment keywords. Once the keywords are found, it associates the comment with a sentiment rank. Sentiment analysis can also be extended
This paper presents a solution to classifying sentences with multi-labels. This problem is an essential part to a semantic search process. Sentences or keywords with correctly automated labelling can enhance the efficiency and performance of the search. The technique introduces a vector space of relevance for keywords
better service quality. This study aims to measure GO-JEK and Grab customer satisfaction through sentiment analysis of Twitter's data. Both companies use Twitter to reach their customers and promote their service. We collect 126,405 tweets from February to March 2016 containing GO-JEK and Grab keywords. Then, we pre-process
To perform a semantic search on a large dataset of images, we need to be able to transform the visual content of images (colors, textures, shapes) into semantic information. This transformation, called image annotation, assigns a caption or keywords to the visual content in a digital image. In this paper we try to
Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precisiontext
Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords
, naive Bayes and rule-based (Ripper) classification algorithms for classification purpose. The classifiers from three algorithms were able to classify the tweets into one of six dialects with some error rate but the classifier study revealed that algorithms were able to pick the keywords that are the salient features of the
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.