The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the last years the volume of data that was generated by the mankind has increased and the complexity of data generated has also increased. Since the computers have evolved and provide more processing power, it is possible to carry out the real-time analysis of big volumes of data. This paper suggests the architecture of a big data processing platform called BigTim, which is able to run clustering...
Ride-sharing practice represents one of the possible answers to the traffic congestion problem in today's cities. In this scenario, recommenders aim to determine similarity among different paths with the aim of suggesting possible ride shares. In this paper, we propose a novel dissimilarity function between pairs of paths based on the construction of a shared path, which visits all points of the two...
Telecom Networks produce huge amount of daily alarm logs. These alarms usually arrive from different regions and network equipments of mobile operators at different times. In a typical network operator, Network Operations Centers (NOCs) constantly monitor those alarms in a central location and try to fix issues raised by intelligent warning systems by performing a trouble ticketing based management...
Clustering in vehicular ad hoc networks (VANETs) is a challenging issue due to the highly dynamic vehicle mobility and frequent communication disconnections problems. Recent years' research have proven that mobility-based clustering mechanisms considering speed, moving direction, position, destination and density, were more effective in improving cluster stability. In this paper, we propose a new...
Breast cancer is a highly heterogeneous disease and very common among women worldwide. Inter-observer and intra-observer errors occur frequently in analyzing the lesion portion of medical images, giving high variability in results interpretations. Computer Aided Diagnosis system (CAD) plays a vital role to overcome this variability. Segmentation is the second critical stage in CAD system to extract...
A set of spectral endmembers can be used to model the spectral variability of an endmember in a hyperspectral image. Clustering analysis is used to group similar spectral endmember signatures into endmember classes. The resulting clusters are used to model the endmember variability in the image. In this paper, hierarchical, partitional and spectral clustering techniques are compared for endmember...
In this paper, a fuzzy approach on the neighbourhood metric is proposed for usage with Histogram equalization for image contrast enhancement. The Neighbourhood Metrics or local image properties are used to sub divide the large histogram bins produced by Global Histogram Equalization. Large histogram bins in the image causes visual deteriorations. We propose, a fuzzy approach to sub divide the large...
Data clustering analysis is the process of finding similarity between data that are assigned into homogeneous groups and the most heterogeneous as possible among groups. There are several analysis methods in wich K-means clustering algorithm is the widly used in different research areas. Therefore, this paper reviews the most known variants of clustering methods which are K-means, IRP-K-means and...
As a branch of statistics, cluster analysis has been extensively studied and widely used in many applications. Cluster analysis has recently become a highly active topic in data mining research. As a data mining function, cluster analysis can be used as a standalone tool to gain insight into the distribution of data, to observe the characteristics of each cluster. Alternatively, it may serve as a...
The amount of unstructured text data available is growing exponentially due to the proliferation of digital information such as emails, text messages, blogs, social media posts, and product reviews. For users of e-commerce websites such as Amazon, navigating thousands of reviews before buying a product can be a daunting task. Unsupervised machine learning techniques can be used to automatically analyze...
Web services are the useful components, which are the integrated software widely used for the over the network machine-to-machine interaction‥ It has got a great importance now-a-days. Hence, a recommendation system that recommends web services to service oriented application developers is vital. In this work, a personalized web service recommendation system is proposed to be developed which recommends...
Document clustering addresses the problem of identifying groups of similar documents without human supervision. Unlike most existing solutions that perform document clustering based on keywords matching, we propose an algorithm that considers the meaning of the terms in the documents. For example, a document that contains the words "dog" and "cat" multiple times may be placed in...
Cloning is a process of reusing the existing code for development of fresh code or to modify an existing system. It involves using a known pattern or source code as aviation over which a new code designed with or without modifying the original source. Several approaches are being used for detection of clones. In our work we modified LSH base approach of Deckard to find clones. Deckard is a scalable...
The proposed contribution uses median fuzzy c-means approach for detection of masses and macrocalcificaiton in mammogram images. Median clustering is a powerful methodology for prototype based clustering of similarity/dissimilarity data. In the MFCM instead of calculating the mean for each cluster to determine its centroid, it calculates the median. This has the consequence of reducing error on the...
The streaming data scenario has brought about unique challenges with it, like outliers detection, large dimensionality and the issue of scalability being at primary focus. The temporal locality is a quite important while, processing evolving data stream (EDS). The inherit patterns present in the data evolves, and hence, the past clusters are no longer valid to the future and also the initial centroids...
Inference of network state and detection of anomaly network behavior based on the available data play important roles in the big data empowered self-organizing networks for enabling 5G. In this paper, we propose a novel framework of efficient network monitoring and proactive cell anomaly detection based on dimension reduction and fuzzy classification techniques. The enhanced semi-supervised classification...
In this paper, we first show that there exists a day pattern in equities volatility and its volatility pattern is different from daily volume profile. To further emphasize on the most important volatility change during the day, we fold the continuous stock minute-by-minute data into n-by-p matrix, where n is number of days and p is number of minutes during trading hour, and decompose the matrix using...
Subspace clustering has typically been approached as an unsupervised machine learning problem. However in several applications where the union of subspaces model is useful, it is also reasonable to assume you have access to a small number of labels. In this paper we investigate the benefit labeled data brings to the subspace clustering problem. We focus on incorporating labels into the k-subspaces...
Distributed Denial of Service (DDoS) has already been one of the most serious threats to network security, and entropy-based approaches for DDoS attack detection are appealing since they provide more detailed insights than traditional traffic volume-based methods. In this paper, we propose a novel entropy-based DDoS attack detection approach by constructing entropy vectors of different features from...
Exact methods for Agglomerative Hierarchical Clustering (AHC) with average linkage do not scale well when the number of items to be clustered is large. The best known algorithms are characterized by quadratic complexity. This is a generally accepted fact and cannot be improved without using specifics of certain metric spaces. Twister tries is an algorithm that produces a dendrogram (i.e., Outcome...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.