The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Due to the emerging Big Data paradigm, traditional data management techniques result inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to social interactions among users calls for advanced analysis strategies. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch...
Gathering the most relevant data for one's need, from the huge collection of data in the internet is a work of great difficult. To make it easier, we propose an application called text clustering, which is an automatic grouping of text documents into clusters, so that documents within a cluster defines the similarity between them, but they are not similar to documents in other clusters. Most of existing...
Online Social Networks (OSNs) provide platform to raise opinions on various issues, create and spread news rapidly in Online Social Network Forums (OSNFs). This work proposes a novel method for Profiling Forum Users (PFU) by exploring their behavioral characteristics based on their involvement in various topics of discussion and number of posts in respective topics posted by them in OSNFs dynamically...
Service architecture of the Internet becomes more and more complex as it expands as a medium for large-scale distribution of diverse content. Dynamic growth of various content distribution systems, deployed by influential Internet companies, content distributors, aggregators and owners, has substantial impact on distribution of the network traffic and the scalability of various Internet services....
The role of the intrusion detection system is to enforce the pattern matching policies decided for the network. Basically Proposed IDS executes on the KDD'99 Data set, this data set is used in international level for evaluating/calculating the performance of various intrusion detection systems (IDS). First step is association phase in which frequent item set are produced by apriori algorithm. The...
In this paper we present clustering method is very sensitive to the initial center values, requirements on the data set too high, and cannot handle noisy data the proposal method is using information entropy to initialize the cluster centers and introduce weighting parameters to adjust the location of cluster centers and noise problems. The navigation datasets which are sequential in nature. Clustering...
Due to the importance of high-quality customer service, many companies use intelligent helpdesk systems (e.g., case-based systems) to improve customer service quality. However, these systems face two challenges: 1) Case retrieval measures: most case-based systems use traditional keyword-matching-based ranking schemes for case retrieval and have difficulty to capture the semantic meanings of cases...
Fraud is increasing with the extensive use of internet and the increase of online transactions. More advanced solutions are desired to protect financial service companies and credit card holders from constantly evolving online fraud attacks. The main objective of this paper is to construct an efficient fraud detection system which is adaptive to the behavior changes by combining classification and...
In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and...
Analysis about EST data usually starts with EST clustering, the process of grouping fragments according their original consensus long sequence. The similarity between ESTs always means that part of the sequences match with each other in some way. Accurate clustering is quadratic in time in average EST length and numbers, and the number of ESTs in public EST database is increasing exponentially. With...
Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document...
We presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information...
With the rapid development of online shopping, the ability to segment e-shoppers basing on their preferences and characteristics has become a key source of competitive advantage for firms. This paper presented the realistic algorithms for clustering e-shoppers in e-commerce applications. Multi-dimensional range search is presented to solve the range-searching problem. This is a multi-level structure...
Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. Deep Web sources store their content in searchable databases that only produce result dynamically in response to a direct request. In this paper, we proposed an automatic classification...
Clustering organizes text in an unsupervised fashion. In this paper, we propose an algorithm for the fuzzy clustering of text documents using the naive Bayesian concept. Fuzzy clustering implies that the text documents are assigned to multiple clusters, ranked in descending order of probability. The Vector Space Model is used to represent our dataset as a term-weight matrix. In any natural language,...
To enable effective access to databases on the Web, it is critical to integrate the large scale deep Web sources. Therefore, schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching has some significant limitations until now. And also, there are some problems...
This paper studies the web wrapper generation for web pages of forum, blog and news web sites. While more and more web pages are dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. We present a new tree alignment algorithm to find...
Internet is becoming a spreading platform for the public opinion. It is important to grasp the Internet public opinion in time and understand the trends of their opinion correctly. Text classification plays a fundamental role in a number of information management and retrieval tasks. But Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information...
Internet technology has developed rapidly and both software system and hardware equipment have improved greatly in recent years. However, Internet brings people not only convenience but also great potential threats. Facts show that potential safety hazards exist from the emergence of internet. As a kind of effective information security safeguard measure, intrusion detection makes up for the defects...
With the development of personalized service technology, analyze the status of deep Web database and personalized service, put forward one new method of personalized service according to user behavior currently in deep Web database, discuss the key technology. The experiment shows the system can conquer the limitation of personalized service too depend on the user behavior.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.