The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we examine the possibility to utilize the well-known approximations of Jaccard metric in order to reduce computational complexity of Edit Distance metric estimation. The scope of our analytical results is the representing strings rather than the original (raw) textual data, still in practice we obtained a solid indication that the results can be applied to (raw) strings that have low...
This paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the High Performance Analytics Toolkit (HPAT). The comparison was performed using two applications: a unidimensional vector sum and the K-means clustering algorithm. The experiments were performed in distributed and shared memory environments with different numbers and configurations of virtual...
In order to improve the spectrum utilization rate of Device-to-Device (D2D) communication, we study the hybrid resource allocation problem, which allows both the resource reuse and resource dedicated mode to work simultaneously. Meanwhile, multiple D2D devices are permitted to share uplink cellular resources with some designated cellular user equipment (CUE). Combined with the transmission requirement...
This paper proposes a novel cluster-based geometrical dynamic stochastic model for multi-input and multi-output (MIMO) communication environments. It is assumed that the base station (BS) has an adaptive learning ability and the BS preprocesses the received data in advance. In the proposed channel model, scatterers can be clustered into different clusters based on a machine learning cluster algorithm...
Recent research efforts show the benefits of using machine learning and interactive visualizations in data analytics. However, there is a void in the implementation of these techniques for the analysis of large and complex 4-dimentional (4D) unsteady flows. Hence, this paper presents an initial development of a virtual environment (VE) to fill this void. The VE has a two-layer architecture with different...
Matrix factorization is a popular low dimensional representation approach that plays an important role in many pattern recognition and computer vision domains. Among them, convex and semi-nonnegative matrix factorizations have attracted considerable interest, owing to its clustering interpretation. On the other hand, the generalized correlation function (correntropy) as the error measure does not...
In the paper, we propose a rigid motion segmentation algorithm with the grid-based optical flow. The algorithm selects several adjacent points among grid-based optical flows to estimate motion hypothesis based on a so-called entropy and generates motion hypotheses between two images, thus separates objects which move independently of each other. The grid-based entropy is accumulated as a new motion...
Social data from online social networks is expanding rapidly as the number of users and articles posted increases, making public opinion analysis a greater challenge. Real-time topic detection is a key part of public opinion analysis. The complex data processing involved in traditional clustering and text categorization can lead to time delays in topic detection. In this paper we construct similar...
Using solely the information retrieved by audio finger-printing techniques, we propose methods to treat a possibly large dataset of user-generated audio content, that (1) enable the grouping of several audio files that contain a common audio excerpt (i.e. are relative to the same event), and (2) give information about how those files are correlated in terms of time and quality inside each event. Furthermore,...
The random Fourier Features method has been found very effective in approximating the kernel functions. Our former studies show that through a mixing mechanism of the feature space formed by random Fourier features and certain linear algorithms, the fuzzy clustering results in the approximated feature space are comparable to or even exceed the classical kernel-based algorithms. To increase the robustness...
For text clustering task, distinctive text features selection is important due to feature space high dimensionality. It is essential to reduce the feature space dimension to increase accuracy and decrease processing time. In this work, for text clustering task, we introduce a novel hybrid feature selection model. This method measures the term importance based on the correlation coefficient among four...
How to reduce the computation time and how to improve the quality of the clustering result are the two major research issues. Although several efficient and effective clustering algorithms have been presented, none of which is perfect. As such, an effective clustering algorithm, which is based on the prediction of searching information to determine the search directions at later iterations and employs...
One of the most well-known clustering methods for wireless sensor network is, no doubt, the so-called low energy adaptive clustering hierarchy (LEACH) because it is simple and easy to implement. Although LEACH tries to provide a fair selection mechanism by randomly selecting a number of sensors as the cluster-heads, it does not take into account the distribution of sensors, the main reason that LEACH...
Modularity is an evaluation measure for graph clustering. Louvain method is constructed by local optimization for modularity and is bottom up method as well as agglomerative hierarchical clustering. Cluster validity measures are used to evaluate cluster partitions as well as modularity. They are traditional evaluation measures in the field of clustering. We propose a novel graph clustering which is...
Aiming at the problem of complex roof reconstruction in airborne LiDAR data processing, an algorithm of reconstructing building models based on isoheight is proposed in this paper. Extracting points in the elevation scopes and fitting contours in different elevation, the key points of the roof models can be got. By the key points, we can get roof planes intersecting lines approximately. Through the...
Localization of a viewer's region of interest (ROI) on eye gaze signal trajectories acquired by eye trackers is a widely used approach in scene analysis, image compression, and quality of experience assessment. In this paper, we propose a novel clustering approach for ROI estimation from potentially noisy raw eye gaze data, based on signal processing on graphs. The clustering approach adapts graph...
Detecting the groundwater runoff connectivity is important for mining and environment protection. However, traditional physical and chemical experiments based approaches are neither efficient nor effective. Experimental results have shown the bacterial community in an isolated well contains unique DNA sequences, and the bacterial communities in connected wells have common DNA sequences that are not...
In semi administered bunching is one of the vital errands and goes for gathering the information objects into classes (groups) to such an extent that the similitude of items inside bunches is high and the comparability of articles between bunches is Less. The dataset once in a while might be in blended nature that is it might comprise of both numeric and unmitigated sort of information. So two types...
Word wide web is considered as the most important information store in recent years. Web development expands to a great extent with new technologies. Search engines are ineffective when the number of docs in the web is multiplied. In the same way, the retrieval of queries, most of which are not related to what the user was looking for. The documents are of varied and flexible web, there are tough...
Ant colony optimization (ACO) is a quite mature optimization algorithm for combinational problems, but it still attracts many researchers trying to raise its efficiency and/or performance. Some of them endeavor to speed up or improve ACO by choosing more suitable parameters of iteration or update formulas. This work tries to introduce Λ-means clustering to enhance the efficiency of ACO for the traveling...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.