The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Keyword extraction is an automated process that collects a set of terms, illustrating an overview of the document. The term is defined how the keyword identifies the core information of a particular document. Analyzing huge number of documents to find out the relevant information, keyword extraction will be the key approach. This approach will help us to understand the depth of it even before we read...
Coreference resolution plays a significant role in natural language processing systems. It is the method of figuring out all the noun phrases that refer back to the identical real world entity. Several researches have been done in noun phrase coreference resolution by using certain machine learning techniques. Our paper proposes a machine learning approach using support vector machines (SVM) towards...
This paper presents the results of systematic and comparative experimentation with major types of methodologies for automatic duplicate question detection when these are applied to datasets of progressively larger sizes, thus allowing to study the learning profiles of this task under these different approaches and evaluate their merits. This study was made possible by resorting to the recent release...
Fine-grained activity understanding in videos has attracted considerable recent attention with a shift from action classification to detailed actor and action understanding that provides compelling results for perceptual needs of cutting-edge autonomous systems. However, current methods for detailed understanding of actor and action have significant limitations: they require large amounts of finely...
Machine-learning algorithms have shown outstanding image recognition performance for computer vision applications. While these algorithms are modeled to mimic brain-like cognitive abilities, they lack the remarkable energy-efficient processing capability of the brain. Recent studies in neuroscience reveal that the brain resolves the competition among multiple visual stimuli presented simultaneously...
Zero-shot Learning (ZSL) can leverage attributes to recognise unseen instances. However, the training data is limited and cannot adequately discriminate fine-grained classes with similar attributes. In this paper, we propose a complementary procedure that inversely makes use of attributes to infer discriminative visual features for unseen classes. In this way, ZSL is fully converted into conventional...
Wireless capsule endoscopy video summarization (WCE-VS) is highly demanded for eliminating redundant frames with high similarity. Conventional WCE-VS methods extract various hand-crafted features as image representations. Researches show that such features only reflect the low-level characteristics of single frame and essentially are not effective to capture the semantic similarity between WCE frames...
Attributes are defined as mid-level image characteristics shared among different categories. These characteristics are suitable in order to handle classification problems especially when training data are scarce. In this paper, we design discriminative real-valued attributes by learning nonlinear inductive maps. Our method is based on solving a constrained optimization problem that mixes three criteria;...
We present a novel algorithm for the semantic labeling of photographs shared via social media. Such imagery is diverse, exhibiting high intra-class variation that demands large training data volumes to learn representative classifiers. Unfortunately image annotation at scale is noisy resulting in errors in the training corpus that confound classifier accuracy. We show how evolutionary algorithms may...
Word2vec is a neural network language model which can convert words and phrases into a high-quality distributed vector (called word embedding) with semantic word relationships, so it offers a unique perspective to the text classification and other natural language processing (NLP) tasks. In this paper, we propose to combine improved tfidf algorithm and word embedding as a way to represent documents...
This paper develops a large-scale classification algorithm for cargo X-ray images using ensemble of exemplar-SVMs. Large-scale or fine-grained classification is very helpful for customs to improve the inspection efficiency and liberate their inspectors. However, big intra-class variation accompanied with small inter-class variation of cargo images makes it almost impossible to classify them using...
In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach...
The word-level sentiment analysis is an essential issue in opinion mining. One challenge in this field is that not so many lexical items as expected have been labeled with sentimental opinions, especially in Chinese. There are two ways of rating words: one is manual marking which costs lots of resources, energy and time; the other is machine marking which is efficient, convenient and time-saving....
This paper presents a method named SoSVMRank, which integrates the social information of a Web document to generate a high-quality summarization. In order to do that, the summarization was formulated as a learning to rank task, in which the order of a sentence or comment was determined by its informative information. The informative information was measured by a set of local and social features in...
In order to manage and organize information on the web, we propose a novel web page classification strategy integrating topic model and SVM. We use topic model to harness the implicit information on web pages for feature extraction. Accuracy of the strategy is 84.15%, 2.23% superior to the traditional classification strategy based on CHI.
In this paper, we describe our practical efforts for applying speech emotion recognition(SER) in customer care scenarios. We systematically analyze the challenges we observe in our data, which are very different from speech emotion databases uttered by actors. Our contributions are two-fold. One, we propose a 2-level framework to measure the customers satisfaction score on the conversation level....
Word segmentation is the first step in Chinese natural language processing, and the error caused by word segmentation can be transmitted to the whole system. In order to reduce the impact of word segmentation and improve the overall performance of Chinese short text classification system, we propose a hybrid model of character-level and word-level features based on recurrent neural network (RNN) with...
The traditional text classification methods usually follow this process: first, a sentence can be considered as a bag of words (BOW), then transformed into sentence feature vector which can be classified by some methods, such as maximum entropy (ME), Naive Bayes (NB), support vector machines (SVM), and so on. However, when these methods are applied to text classification, we usually can not obtain...
People write online documents from different personal perspectives. The competitive perspectives they hold reflect the conflicts in their fundamental stances and viewpoints. For many security-related applications, it is both beneficial and critical to identify the competitive perspectives implied in online documents. Previous work on competitive perspective identification is based on word features,...
Nowadays, spam messages have been overflowing in many countries. They seriously violate personal rights, and may even harm the national security. The existing filtering techniques usually uses traditional text classifiers, which are more suitable to deal with normal long texts, therefore, it often faces some serious challenges, such as the sparse data problem and noise data in the SMS message. This...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.