The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We proposed a scalable outlier detection method to identify outliers in large datasets with a goal to create unsupervised intrusion detection. In our work, the strength of Kolmogorov-Smirnov test and K-means clustering algorithm, both with linear time complexity, are combined to create fast outlier detection. While still maintaining high detection rate and low false alarm rate, our method can easily...
Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM...
Clustering the genes based on their expression patterns is one of the important subjects in analyzing microarray data. Discovering the genes co-expressed in particular conditions has been done by different clustering algorithms. In these methods, the similar genes are located in the same cluster. Thus, the closer the similar genes, the further the dissimilar ones will be. Each of the applied methods...
Cluster is bunch of similar items. Unsupervised classification of patterns into clusters is known as clustering. It is useful in knowledge discovery in data. Clustering is able to deal with different data types. Fuzzy rules are used for data intelligence illustration purpose. User gets highly interpretable discovered clusters using fuzzy rules. To generate accurate fuzzy rules triangular membership...
Outlier detection is used to detect abnormalities in various application domains including clustering based disease onset identification, gene expression analysis, computer network intrusion, financial fraud detection and human behaviour analysis. Existing methods to detect outliers are inadequate due to poor accuracy and lack of any general technique. Most techniques consider either small clusters...
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks...
Liver cancer is one of the major death factors in the world. Transplantation and tumor resection are two main therapies in common clinical practice. Both tasks need image assisted planning and quantitative evaluations. An efficient and effective automatic liver segmentation is required for corresponding quantitative evaluations. Computed Tomography (CT) is highly accurate for liver cancer diagnosis...
Death cells and living cells counting after cancer drug treatment is a mandatory process for in vitro study to evaluate the effectiveness of the treatment in cancer research. The conventional process using trypan blue dye staining requires expertise and it is time-consumed and tedious work. The aim of this study was to develop a computer-assisted program that counts a number of cells by using image...
Microarray technology helps biologists for monitoring expression of thousands of genes in a single experiment on a small chip. Microarray is also called as DNA chip, gene chip, or biochip is used to analyze gene expression. DNA microarrays are rapidly becoming a fundamental tool in genomic research. Bioinformatics and data mining provide exciting and challenging researches in several application areas...
SIFT-NMI algorithm is proposed for image matching based on SIFT (Scale-invariant feature transform) and NMI (Normalized Moment of Intertia) algorithm in this paper. Firstly, the SIFT algorithm is used to obtain the coordinates and vector matrix of the image's feature points. Then, the moment of intertia of the vector is obtained based on NMI algorithm and the pairs of matching features points are...
The weight of all users' score is the same in traditional collaborative filtering recommendation algorithm, and it doesn't consider the shift of users' preferences with time, so recommendation quality is poor. In order to avoid the problems above, a novel collaborative filtering algorithm based on shift of users' preferences is presented: The method adjusts the weight of users' score according to...
The paper adopts the fuzzy c-means text mining method in lots of text mining methods. But aim at the defect that the initial value of the fuzzy c-means is more sensitivity and poor stability, an improved GAFCM text mining method has been put forward. GAFCM uses global search features of genetic algorithms to improve the fuzzy c-means. Finally, it has proved that the improved text mining method has...
This paper presents a hybrid clustering algorithm based on density and ant colony algorithm, that to determine the initial cluster centers according to cluster objects distribution density method, and then use the swarm intelligence and randomness of ant colony algorithm to find that arbitrary shape of clusters, to avoid falling into local convergence, to get a relatively stable global optimal solution...
This paper proposes an improved genetic algorithm, it keeps the population diversity by similarity checks on the population before selection, and the algorithm solves the early-maturing problem of the population evolution, and proposes a formula for mutation probability related with similarity rate and iteration times. The algorithm not only maintains a good diversity of population, but also guarantees...
This paper presents a new method for Car License Plate Characterspsila Segmentation. The proposed approach is not only simple but also more effective than some of the existing method reported earlier. The novelty lies in this case in its treatment which not only unites the projection and template match, but also improves the techniques. The quality of image that be shot in the natural environment...
With the purpose to reduce the surplus information on decision table and extract the determinative rules, an autonomous clustering algorithm based on graded datum subtraction (ACGDS) is proposed to reduce the data area and an attribute reduction algorithm based on ant colony optimization (ARACO) is presented to reduce the surplus attributes. ACGDS uses the quick sort method and subtraction to every...
The algorithm of locally adaptive clustering for high dimensional data (LAC) processes soft subspace clustering by local weightings of features. To solve the localization of LAC in specifying the number of clusters, this paper reworks the validity index for fuzzy clustering to evaluate the clustering results of LAC. Compared with real clustered data, the method is proved feasible. In the new algorithm,...
Based on the clustering technology in data mining, we aimed to establish a new schoolwork identifying mechanism. In order to let the normal answer can adapt to actual situations better, we first generalized the normal answer, and then calculated the similarity between every sample and normal answer, as well as similar degree between school works. Based on the similarity, we clustered all school work...
Nowadays most search engine like Google, Baidu, demonstrate their query results by the value of item, listing them in several pages. As we are now in an age of information explosion, the number of pages will be huge and users have to glance over several before they get what they want. If we cluster the results, this problem will be solved. There are several clustering methods, but not quite accurate...
We have investigated a technique for recognising faces invariant of facial expressions. We apply multi-linear tensor algebra, which subsumes linear algebra, to analyse and recognise 3D face surfaces. This potent framework possesses a remarkable ability to deal with the shortcomings of principle component analysis in less constrained situations. A set of vector spaces can be used to represent the variation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.