The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Selecting relevant features in data modeling is critical to ensure effective and accurate prediction of future effects. The problem becomes compounded when the relevance of previously selected features cannot be guaranteed due to changes in the underlying dataset. We propose an algorithm based on the statistical plaid model for the discovery and tracking of feature relevance scores in datasets that...
To extract key topics from news articles, this paper researches into a new method to discover an efficient way to construct text vectors and improve the efficiency and accuracy of document clustering based on Word2Vec model. This paper proposes a novel algorithm, which combines Jaccard similarity coefficient and inverse dimension frequency to calculate the importance degree between each dimension...
In this paper, a new method of saliency-based traffic sign detection is presented. On the basis of the visual attention mechanism model, edge and color information are extracted as early visual features, and each feature is computed and normalized to obtain feature maps, conspicuity maps and the saliency map. Then the candidate regions containing traffic signs are determined with self-organizing map...
This paper presents the author clustering problem and compares it to related authorship attribution questions. The proposed model is based on a distance measure called Spatium derived from the Canberra measure (weighted version of L1 norm). The selected features consist of the 200 most frequent words and punctuation symbols. An evaluation methodology is presented and the test collections are extracted...
This paper demonstrates a comparative study of Arabic Multi-Document Summarization System (AMD-SS). These methods are compared and analyzed, aiming to detect which method generates a genuine summary and achieves the best results in comparison with the human summarization techniques. The comparative study shows that there is a lack in the area of Arabic Automatic Text Summarization systems. Therefore,...
Conventional pedestrian detection methods construct models based on hand-crafted features or deep learning. They are powerful but limited due to finite capabilities of single classifiers. Ensemble models escape these problems by assembling multiple classifiers using some man-made criteria which synthetically utilize information from all combined models. However, these criteria lack theoretical support...
The scale of big data is increasing in every minute, and it becomes important to handle massive data. The familiar problem of Big data is not only huge volume but also planned in many places to provide high dimensionality in feature selection. In numerous big data application, feature selection is significant to select the essential features from the known data set and it removes unrelated and disused...
To facilitate the transition of brain-computer interface (BCI) systems from laboratory settings to real-world application, it is very important to minimize or even completely eliminate the subject-specific calibration requirement. There has been active research on calibrationless BCI systems for classification applications, e.g., P300 speller. To our knowledge, there is no literature on calibrationless...
With the emergence of the big data age, how to get valuable hot topic from the vast amount of digitized textual materials quickly and accurately has attracted more and more attention. This paper proposes a parallel Two-phase Micmac Hot Topic Detection(TMHTD) method specially design for microblogging in “Big Data” environment, which is implemented based on Apache Spark cloud computing environment....
As a prerequisite technique, Deep Packet Inspection (DPI) plays a major role to contemporary network security and management. The key of DPI is a repository of protocol fingerprints. However, inferring and maintaining up-to-date fingerprints for various and new protocols is very difficult in order to adapt them to the continuous evolution of the protocols. In this paper, we propose ProDigger, a robust...
The large number of SNS users brings marketers and managers huge opportunities and tough challenges simultaneously to extract managerial implications from SNS user behaviors. To gain insight into user behaviors, researchers divide users into roles (i.e. user groups) to analyze the difference of user behaviors between distinct roles. In traditional role discovery algorithms, the number of roles is...
Key frame selection is important to dense 3D reconstruction, especially for unordered image sets. A novel method for key frame selection from unordered image sets is proposed based Distance Depedent Chinese Restaurant Process (DDCRP). First, a bag-of-features word package is constructed to describe each image in a document-like manner, which can be dealt with by the DDCRP model. Second, the overlapping...
In large population speaker identification (SI) system, likelihood computations during testing stage can be time-consuming. In such a case, clustering method is applied to this situation. But the traditional clustering algorithm based on K-means is sensitive to the randomly chosen initial cluster centers. To address this issue, the paper proposes an improved clustering algorithm which uses an initial...
In this paper, we present a scene recognition framework, which could process the images and recognize the scene in the images. We demonstrate and evaluate the performance of our system on a dataset of Oxford typical landmarks. In this paper, we put forward a novel method of local k-meriod for building a vocabulary and introduce a novel quantization method of soft-assignment based on the Gaussian mixture...
Aesthetic tendency discovery is a useful and interesting application in social media. This paper proposes to categorize large-scale Flickr users into multiple circles. Each circle contains users with similar aesthetic interests (e.g., landscapes or abstract paintings). We notice that: 1) an aesthetic model should be flexible as different visual features may be used to describe different image sets,...
Recently there has been great interest in the application of word representation techniques to various natural language processing (NLP) scenarios. Word representation features from techniques such as Brown clustering or spectral clustering are generally computed from large corpora of unlabeled data in a completely unsupervised manner. These features can then be directly included as supplementary...
In this model, we attack the common problem of varying comprehending and perception capacities which differ with every individual. For understanding any concept, different individuals might require different levels of difficulty. Thus, we propose a model that performs clustering of text based on difficulty. Initially, with different feature extraction techniques, the scores of various textual characteristics...
This paper proposed a recommended method of standard bibliography based on topic model which fused multi-feature. Firstly, the LDA topic model was used to analyze the standard resource which user concerned, then the user attention model was created by combined with the user's information, Secondly, by analyze the feature of standard bibliography documents in attribute, classification and association...
Tor is a famous anonymity communication system for preserving users' online privacy. It supports TCP applications and packs application data into encrypted equal-sized cells to hide some private information of users, such as the running application type (Web, P2P, FTP, Others). The known of application types is harmful because they can be used to reduce the anonymity set and facilitate other attacks...
In this paper, we address the problem of recognizing group activities that include interactions between human objects based on their motion trajectory analysis. In order to resolve the complexity and ambiguity problems caused by a large number of human objects, we propose a Group Interaction Zone (GIZ) to detect meaningful groups in a scene so as to be robust against noisy information. Two novel features,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.