The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity...
Document clustering is an important tool to help managing the vast amount of digital text document. This paper introduces a new approach to cluster text document. First, text is preprocessed and indexed using inverted index. Then the index is trimmed using TF-DF thresholding. After that, Term Document Matrix is built based on TF-IDF. Next step uses Latent Semantic Indexing to extract important feature...
The proposed work is based on abstractive summarization which is the division of text summarization. It developed a summary of the multi-document using the semantic relationship between the input documents rather than what we get exactly from the input document. It is very necessary because of the difficulty of generating abstract manually and also a challenging task. In our system, summary is generated...
Aspect extraction is one of the most important tasks for text mining. Semi-supervised methods have been proposed to solve this problem. However, the seed terms have to be given in advance in these methods. The current methods categorize the aspects without expanding more aspects terms. And most of the methods are based on English corpus, there is a great space for the research on the aspect extraction...
Entity alignment is an important issue in the areas of ontology alignment and computational intelligence. Ontology alignment is a key technology to solve the semantic heterogeneity problem of ontology and the Semantic Web, and to realize knowledge reusing and integration. The task of entity alignment is to identify entities represented in textual documents or web pages which refer to the same entities...
Synonyms extraction is a fundamental research, which is helpful to text mining and information retrieval. In this paper, we propose method to extract synonymy from text, the method employs spectral clustering and word2vec. First, the word2vec model is trained by a large-scale English Wikipedia corpus. Then, we extract keywords from a text and use the trained model to generate similarities among these...
Time-sync comments reveal a new way of extracting the online video tags. However, such time-sync comments have lots of noises due to users' diverse comments, introducing great challenges for accurate and fast video tag extractions. In this paper, we propose an unsupervised video tag extraction algorithm named Semantic Weight-Inverse Document Frequency (SW-IDF). SW-IDF first generates corresponding...
In recent years, due to the growth of information onthe internet, the number of available Web services has increased.Clustering Web services based on their functional features todifferent domains have started to play a major role in severalservice management tasks such as efficient Web service discoveryand recommendations. In this paper, we propose a novelontology-based approach for Web service clustering...
Due to the rapid growth in both the number and diversity of Web services on the web, it becomes increasingly difficult for us to find the desired and appropriate Web services nowadays. Clustering Web services according to their functionalities becomes an efficient way to facilitate the Web services discovery as well as the services management. Existing methods for Web services clustering mostly focus...
To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and accuracy of document clustering based on Word2Vec model. This paper proposes a novel algorithm, which combines Jaccard similarity coefficient and inverse dimension frequency to calculate the importance degree between each dimension...
This paper demonstrates a comparative study of Arabic Multi-Document Summarization System (AMD-SS). These methods are compared and analyzed, aiming to detect which method generates a genuine summary and achieves the best results in comparison with the human summarization techniques. The comparative study shows that there is a lack in the area of Arabic Automatic Text Summarization systems. Therefore,...
With the development of the Internet, it is vital for the security of the Internet to detect web-based anomalies. Clustering based on feature extraction by manually has been verified as a significant way to detect new anomalies. But the presentations of these features can't express semantic information of the URLs. In addition, few studies try to cluster the anomalies into specific types like SQL-injection...
In the proposed approach, Word Sense Disambiguation (WSD) in Bengali language has been done using unsupervised methodology. This work is consisted of sequential two sub-tasks. First one is grouping of Bengali sentences into a certain number of clusters where a particular cluster contains the sentences of similar meaning and second one is labeling the clusters with its inner meanings with the help...
The last decade has witnessed a dramatic growth of social networks, such as Twitter, Sina Microblog, etc. Messages/short texts on these platforms are generally of limited length, causing difficulties for machines to understand. Moreover, it is rarely possible for users to read and understand all the content due to the large quantity. So it is imperative to cluster and extract the viewpoints of these...
As semantic information is often missing in text representation, this paper proposes semantic graph structure to represent text and optimize graph structure by semantic similarity matrix. Then calculate the similarity of semantic graph structure by using the maximum common sub-graph of graph theory. Finally, K-means algorithm will be applied to expand Chinese text clustering to improve text clustering...
A story is defined as actors taking actions that culminate in resolutions. In this paper, we extract subject - verb - object relationships from paragraphs and generalize them into semantic conceptual representations. Overlapping generalized concepts and relationships correspond to archetypes/targets and actions that characterize story forms. We present an analytic framework which implements co-clustering...
With the rapid growth of Internet consumption, the various product comments' form and redundant information are not convenient for the customers to grasp the hot opinions of the historical comments. In view of this, this paper studies the hot opinions of the products' comments and takes the hotel comments data as the main research objects. We filter the comment data from the length of the comments...
Document clustering is a popular topic in data mining and information retrieval. Most models and methods for this problem are based on computing the similarity between pair documents modeled in a space of all terms, or a new feature space obtained by applying a topic modeling technique for a given corpus. In this paper, we regard these two ideas as clustering on term feature and on semantic feature,...
This paper presents a new Bag-of-Features model (BoF) to enhance the efficiency of automatic image annotation. Since the traditional BoF ignores the semantic of its vocabularies, it cannot be seen as descriptive representation of images in many image applications. To handle this critical limitation, firstly, we propose the RGB compressive texton. By using compressive sensing theory, the image can...
Clustering product features is the essential task to mine opinions from unstructured online reviews because different customers usually express the same feature with different words or phrases. Several supervised and unsupervised methods have been applied to accomplish this task. In this paper, we propose an orthogonal nonnegative matrix tri-factorizations model to solve the problem. We first construct...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.