The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Schema matching is a basic problem in many application fields as fundamental operation of schema. This paper analyses the advantages and limitations of iMAP and CM, and proposes an extended schema matching system architecture between Database Schemas (ESM). First, ESM filters unreasonable matches by preprocess and clustering process. Then it employs a set of special-purpose searchers to explore a...
This paper deals with content based image retrieval. We propose a logo recognition algorithm based on local regions, where the trademark (or logo) image is segmented by the clustering of points of interest obtained by Harris corners detector. The minimum rectangle surrounding each cluster is detected forming the regions of interest. Global features such as Hu moments and histograms of each local region...
For the purpose of monitoring and knowing the extent of use of objects in the network, generally each network operation is stored for subsequent analysis to determine which objects and how often are they accessed and modified. The self-organization of such multi-state operations is used in this work to help clustering of relationships amongst users or between users and network objects; and enhance...
We presents data mining-based techniques for enabling data integration across deep web data sources. We target query processing across inter-dependent data sources. Thus, besides input-input and output-output matching of attributes, we also need to consider input-output matching. We develop data mining techniques for discovering the instances for querying deep web data sources from the information...
The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence...
Nowadays, there are many data mining tools for various data processing purposes, i.e. classifying and clustering. Among them, Weka and GeneXproTools are widely discussed. In this paper, we evaluate these tools' classification function in authentic emotion recognition. Meanwhile, we develop a hybrid classification algorithm and compare it with these data mining tools. Finally, we list the recognition...
Public transportation IC card systems have a large amount of daily bus passengers travel information, and are a valuable source of information on the city transport system. This paper discusses the need for data mining of public transportation IC cards, and research methods for public transportation IC card data pre-processing. The data from application of statistical clustering on the behavior of...
With the rapid development of online shopping, the ability to segment e-shoppers basing on their preferences and characteristics has become a key source of competitive advantage for firms. This paper presented the realistic algorithms for clustering e-shoppers in e-commerce applications. Multi-dimensional range search is presented to solve the range-searching problem. This is a multi-level structure...
Energy efficiency is the most important issue in all facets of wireless sensor networks (WSNs) operations because of the limited and non-replenishable energy supply. And WSNs are deployed in environments where sensors can be exposed to conditions that might interfere with the sensor readings. Moreover, a variety of sensors may be attached to WSNs to monitor the environment. Data aggregation, eliminating...
The Web is overcrowded with news articles, an overwhelming information source both with its amount and diversity. Assigning news articles to similar groups, on the other hand, provides a very powerful data mining and manipulation technique for topic discovery from text documents. In this paper, we are investigating the application of a great spectrum of clustering algorithms, as well as similarity...
In this paper, problem of efficient representation of large database of target radar cross section is investigated in order to minimize memory requirements and recognition search time, using a tree structured hierarchical wavelet representation. Synthetic RCS of large aircrafts, in the HF-VHF bands, are used as experimental data. Hierarchical trees are built using wavelet multiresolution representation...
Location Fingerprinting is one of the radio positioning techniques which has been proposed in the field of Location Based Services (LBS). Considering the actual trends towards energy efficient systems and green networking, reducing the energy consumption has become a challenging issue in the context of fingerprinting systems. In this paper we present a clustering technique which aims to compress the...
Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. Deep Web sources store their content in searchable databases that only produce result dynamically in response to a direct request. In this paper, we proposed an automatic classification...
Human Genome Project completed in 2003 undertaken by National Institute of Health (NIH), USA and US Department of Energy to determine the sequence of chemical base pair which make up Deoxyribonucleic Acid (DNA). This mega project generated huge amount of Data and Information. The Enormous amount of data generated through whole-genome sequencing efforts, microarray technologies, and mapping of single...
K-means Clustering is an important algorithm for identifying the structure in data. Kmeans is the simplest clustering algorithm. This algorithm takes a predefined number of clusters as input. Mean stands for an average, an average location of all the members of a particular cluster. This algorithm is based on random selection of cluster centers and iteratively improving the results. In this work,...
It has become much more difficult to access relevant information from the Web With the explosive growth of information available on the World Wide Web. One of the promising approaches is web usage mining, which mines web logs for user models and recommendations. Different from most web recommender systems that are mainly based on clustering and association rule mining, this paper proposes an web personalization...
That traditional K-mean algorithm is a widely used clustering algorithm, with a wide application. In light of the disadvantage of K-mean algorithm, improvement is made to the traditional K-mean algorithm, a k value learning algorithm is proposed. Using genetic algorithm to optimize the K value, and improve clustering performance.
According to the efficiency bottleneck of algorithm DBSCAN, we present P-DBSCAN, a novel parallel version of this algorithm in distributed environment. By separating the database into several parts, the computer nodes carry out clustering independently; after that, the sub-results will be aggregated into one final result. P-DBSCAN achieves good results and much better efficiency than DBSCAN. Experiments...
Several real applications need to manage fuzzy information. Among the languages proposed for this type of data, the Fuzzy SQL (FSQL) language had a great success, seen its great power of modeling and it's an extension of the well-known SQL language. In this paper, we propose an alternative for FCM algorithm For Fuzzy Database describe with FSQL. The conventional fuzzy clustering algorithms form fuzzy...
The fast evolution of hardware and the internet made large volumes of data more accessible. This data is composed of heterogeneous data types such as text, numbers, multimedia, and others. Non-overlapping research communities work on processing homogeneous data types. Nevertheless, from the user perspective, these heterogeneous data types should behave and be accessed in a similar fashion. Processing...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.