The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Designing Chinese-Uighur-English online dictionary is very important for the development of ethnic scientific research and education, which is the basis for the work of Uighur semantic study. Online dictionary with a huge thesaurus can be implemented by knowledge graph, existing works have not addressed in much detail. In order to design the thesaurus of the online dictionary, this paper is based...
Text summarization, a field of data mining, is very important for developing various real-life applications. Many techniques have been developed for summarizing English text(s). But, a few attempts have been made for Bengali text because of its some multifaceted structure. This paper presents a method for text summarization which extracts important sentences from a single or multiple Bengali documents...
Data mining concerns theories, methodologies, and in particular, computer systems for knowledge extraction or mining from large amounts of data. Association rule mining is a general purpose rule discovery scheme. It has been widely used for discovering rules in medical applications. The diagnosis of diseases is a significant and tedious task in medicine. The detection of heart disease from various...
Proteins are the structural components of living cells and tissues, and thus an important building block in all living organisms. Patterns in proteins sequences are some subsequences which appear frequently. Patterns often denote important functional regions in proteins and can be used to characterize a protein family or discover the function of proteins. Moreover, it provides valuable information...
Online Social Networks are so popular nowadays that they are a major component of an individual's social interaction. They are also emotionally-rich environments where close friends share their emotions, feelings and thoughts. In this paper, a new framework is proposed for characterizing emotional interactions in social networks, and then using these characteristics to distinguish friends from acquaintances...
In this paper, computational verb theory (CVT) is applied to the analysis of stock market data. By using CVT, stock market data are clustered into different categories and represented by typical curves for each category. In this paper, researches on the market data samples from Shanghai Stock Exchange in March 2010 are reported. Firstly, MATLAB programs are used to preprocess the stock data. The preprocess...
In order to improve the poor accuracy and stability of the traditional geomagnetic matching navigation by TERCOM, In this paper, we discuss a new method which is based on the integration of TERCOM, K-means clustering algorithm and INS (inertial navigation system) in detail. Through an experiment, we find that the new method has higher accuracy and stability than the traditional method, especially...
In this paper, problem of efficient representation of large database of target radar cross section is investigated in order to minimize memory requirements and recognition search time, using a tree structured hierarchical wavelet representation. Synthetic RCS of large aircrafts, in the HF-VHF bands, are used as experimental data. Hierarchical trees are built using wavelet multiresolution representation...
That traditional K-mean algorithm is a widely used clustering algorithm, with a wide application. In light of the disadvantage of K-mean algorithm, improvement is made to the traditional K-mean algorithm, a k value learning algorithm is proposed. Using genetic algorithm to optimize the K value, and improve clustering performance.
Clustering analysis method is one of the main analytical methods in data mining, the method of clustering algorithm will influence the clustering results directly. This paper discusses the standard k-means clustering algorithm and analyzes the shortcomings of standard k-means algorithm, such as the k-means clustering algorithm has to calculate the distance between each data object and all cluster...
In this paper, problem of efficient representation of large database of target radar cross section is investigated in order to minimize memory requirements and recognition search time, using a tree structured hierarchical wavelet representation. Synthetic RCS of large aircrafts, in the HF-VHF bands, are used as experimental data. Hierarchical trees are built using wavelet multiresolution representation...
K-means clustering algorithm is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. It is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt...
Through analyzing the advantages and disadvantages between anomaly detection and misuse detection, a mixed intrusion detection system (IDS) model is designed. First, data is examined by the misuse detection module, then abnormal data detection is examined by anomaly detection module. In this model, the anomaly detection module is built using unsupervised clustering method, and the algorithm is an...
Music evokes various human emotions or creates music moods through low level musical features. In fact, typical music consists of one or more moods and this can be used as an important factor for determining the similarity between music. In this paper, we propose a new music retrieval scheme based on the mood change pattern. For this, we first divide music clips into segments based on low level musical...
Many enterprises incorporate information gathered from a variety of data sources into an integrated input for some learning task. For example, aiming towards the design of an automated diagnostic tool for some diseases, one may wish to integrate data gathered from many different hospitals. Analyzing and mining these distributed heterogeneous data sources require distributed machine learning and data...
This paper proposes a new similarity measure for the content-based image retrieval (CBIR) systems. The similarity measure is based on the multidimensional generalization of the Wald-Wolfowitz (MWW) runs test and the k-means clustering algorithm. The performance comparisons between the proposed method and the current CBIR method based on MWW runs test were performed, and it can be seen that the proposed...
The K-means algorithm based on partition and the DBSCAN algorithm based on density are analyzed. Combining advantages with disadvantages of the two algorithms, the improved algorithm DBSK is proposed. Because of the partition of data set, DBSK reduces the requirement of memory; the method of computing variable value is put forward; to the uneven data set, because of adopting different variable values...
The basic concept of clustering and its correlating research work is firstly present, a new algorithm based on least clustering cell (LCC) is proposed and analyzed which concerns the advantages and disadvantage of k-means and grid clustering algorithm. This algorithm is efficient in dealing with huge amounts of data and can make paralleled processing, which is proved to be correct, efficient and fast...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.