The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the most important machine learning techniques include clustering of data into different clusters or categories. There are several decent algorithms and techniques that exist to perform clustering on small to medium scale data. In the era of Big Data and with applications being large-scale and data-intensive in nature, there is a significant increment in volume, variety and velocity of data...
Organizing data into its natural grouping based on intrinsic characteristics is the most sensible thing to do with unlabeled data. Mean shift is a non-parametric mode seeking algorithm widely used for data clustering, image segmentation and object tracking, but its use in real time applications is limited because of its high computational cost. In this paper we propose a hybrid, sequentially unfolded,...
We have to reduce the electric energy consumed by servers in a cluster in order to realize eco-society. Types of algorithms for a request process to select an energy-efficient server in a cluster of servers are proposed in our previous studies. Furthermore, algorithms for energy-efficiently migrating a process on a host server to a more energy-efficient guest server is discussed. Virtual machines...
This paper applies the hybrid parallel model that combines both shared and distributed memory architectures to improve the performance of the Smith waterman algorithm (SW). The hybrid model uses both MPI and OpenMp as programming techniques for different memory architectures. Our improved implementation executes a parallel version of SW algorithm with a row wise computation of the alignment matrix,...
A distributed system consists of several autonomous nodes. In a distributed system some of the nodes may be overloaded due to a large number of job arrivals while other nodes may remain idle without any processing. The performance of a distributed system depends crucially on dividing up work effectively among the computing nodes. So a way is needed to share load across all the computing nodes. In...
Given a set of n entities to be classified, and a matric of dissimilarities between pairs of them. This paper considers the problem called Minimum Sum of Diameters Clustering Problem, where a partition of the set of entities into k clusters such that the sum of the diameters of these clusters is minimized. Brucker showed that the complexity of the problem is NP-hard, when k ≥ 3 [1]. For the case of...
While accurate computational models that embody learning efficiency remain a distant and elusive goal, big data learning analytics approaches this goal by recognizing competency growth of learners, at various levels of granularity, using a combination of continuous, formative, and summative assessments. Our earlier research employed the conventional Particle Swarm Optimization (PSO) based clustering...
In this work we propose an efficient parallel algorithm to evaluate an observation sequence on Hidden Markov Model starting from the sequential Forward Algorithm (FA). The Cell Broadband Engine (Cell/B.E.) hybrid architecture, allows us to approach two levels of parallelization in developing our algorithms. Two strategies were implemented and tested in order to obtain a parallel version of the FA,...
Distributed vertex-centric graph processing systems have been recently proposed to perform different types of analytics on large graphs. These systems utilize the parallelism of shared nothing clusters. In this work we propose a novel model for the performance cost of such clusters.We also define novel metrics related to the workload balance and network communication cost of clusters processing massive...
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks...
Synchronous iterative algorithms are often less scalable than asynchronous iterative ones. Performing large scale experiments with different kind of network parameters is not easy because with supercomputers such parameters are fixed. So, one solution consists in using simulations first in order to analyze what parameters could influence or not the behavior of an algorithm. In this paper, we show...
Nowadays, cloud computing is considered as an internet evolution and will be the support for future internet development. In this paper, we abstract the load balancing problem in cloud computing as a model that a few of users occupying the computing resources, and introduce the price variation into the model. We formulate this problem as a cooperative game among job processing nodes. Processors work...
In this paper, a parallel algorithm for Hill Cipher on mapreduce is proposed to reduce the encryption time. As the data in the cloud becomes extensively large, it greatly demands to reduce the encryption time as well as the security storage in the cloud. To address the need, Parallel Modified Hill Cipher is employed to work on mapreduce framework, which is a symmetric encryption scheme and the parallelism...
One of the most useful measures of cluster quality is the modularity of the partition, which measures the difference between the number of the edges joining vertices from the same cluster and the expected number of such edges in a random graph. In this paper, we show that the problem of finding a partition maximizing the modularity of a given graph $(G)$ can be reduced to a minimum weighted cut (MWC)...
Agent-based crowd simulation, which aims to simulate large crowds of autonomous agents with realistic behavior, is a challenging but important problem. One key issue is scalability. Parallelism and distribution is an obvious approach to achieve scalability for agent-based crowd simulation. Parallel and distributed agent-based crowd simulation, however, introduces its own challenges, in particular,...
Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given...
For real time and high availability cluster, key characteristics are datas computing must be finished before the deadline in every computing period and systems never fail in a long life term. First, an real time and high availability cluster platform based on task distributing table is described in the paper, then measures from aspects of process scheme and message transferring are introduced to guarantee...
Parallel task graphs (PTGs) arise when parallel programs are combined to larger applications, e.g., scientific workflows. Scheduling these PTGs onto clusters is a challenging problem due to the additional degree of parallelism stemming from moldable tasks. Most algorithms are based on the assumption that the execution time of a parallel task is monotonically decreasing as the number of processors...
Network model partitioning is a key component of distributed network simulations. Simulations slow down considerably due to inequitable load balancing and heavy inter-host communication leading to unbounded synchronization overhead. Also, regularly refreshing the node partition is necessary due to to the dynamic nature of simulation load and event generation. In this paper, we propose a distributed...
Recent advances in parallel and distributed computing have made it very challenging for programmers to reach the performance potential of current systems. In addition, recent advances in numerical algorithms and software optimizations have tremendously increased the number of alternatives for solving a problem, which further complicates the software tuning process. Indeed, no single algorithm can...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.