The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....
Big data technology refers to the rapid acquisition of valuable information from various types of large amounts of data. It can be divided into 8 technologies: data acquisition, data access, infrastructure, data processing, statistical analysis, data mining, model prediction and results presentation. The paper presents improved statistical analysis method based on big data technology. A statistical...
Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved...
Big data analytics has attracted close attention from both industry and academic because of its great benefits in cost reduction and better decision making. As the fast growth of various global services, there is an increasing need for big data analytics across multiple data centers (DCs) located in different countries or regions. It asks for the support of a cross-DC data processing platform optimized...
Recent technological advancements in typical domains (e.g. internet, financial companies, health care, user generated data, supply chain systems etc.) have directed to inundate of data from these domains. Data outburst trend gave the insight meaning to the buzz word ‘Bigdata’. If we compare with traditional data, Bigdata exhibits some unique characteristics like it is commonly enormous and unstructured...
Due to its simplicity and scalability, MapReduce has become a de facto standard computing model for big data processing. Since the original MapReduce model was only appropriate for embarrassingly parallel batch processing, many follow-up studies have focused on improving the efficiency and performance of the model. Spark follows one of these recent trends by providing in-memory processing capability...
In the emerging field of big data, a large volume of data has to be managed, operating on data of huge volume becomes easier when it's sorted and structured. The data can be structured using a simple algorithm i.e. index algorithm which stores and categories data on basis of their application. This in turn will be very beneficial on business level as well as on software level.
Cloud Computing allows users to control substantial computing power for complex data processing, generating huge and complex data. However, the virtual resources requested by users are rarely utilized to their full capacities. To mitigate this, providers often perform over-commitment to maximize profit, which can result in node overloading and consequent task eviction. This paper presents a novel...
This paper analyzed the challenges of data management in army data engineering, such as big data volume, data heterogeneous, high rate of data generation and update, high time requirement of data processing, and widely separated data sources. We discussed the disadvantages of traditional data management technologies to deal with these problems. We also highlighted the key problems of data management...
Terrorism is a matter of great concern in many nations because of its impact on sustainable development, which is critical for developing countries. Efforts on the part of security agencies need to stay a step ahead of threats of terrorism to effectively prevent their occurrence. Many research efforts that sought to combat terrorism using big data have been reported in the literature. However, most...
For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NewSQL technology. NewSQL is a class of modern relational database management systems (RDBMS) that provide the same scalable performance of NoSQL systems for online transaction processing...
Support vector machines (SVMs) are widely-used for classification in machine learning and data mining tasks. However, they traditionally have been applied to small to medium datasets. Recent need to scale up with data size has attracted research attention to develop new methods and implementation for SVM to perform tasks at scale. Distributed SVMs are relatively new and studied recently, but the distributed...
As we move into big data era, bigprobabilistic numerical data are prevalent because they occur in many modern applications, including sensor databases, spatial-temporal databases, biology information systems, etc. However, traditional associative classifier methods proposed to handle probabilistic numerical data are not suitable for bigprobabilistic numerical data because of the memory usage, parallelization,...
In the current scenario, data is considered to be the biggest assets. One who has maximum relevant data is considered to be rich in the information industry. But only the collection of data is not enough, it needs to be analyzed. This huge amount of data which is termed ass Big Data cannot be analyzed by traditional tools and techniques, rather it requires more advanced Techniques which can make data...
In this paper we present Digree, an experimental middleware system that can execute graph pattern matching queries over databases hosting voluminous graph datasets. First, we formally present the employed data model and the processes of re-writing a query into an equivalent set of subqueries and subsequently combining the partial results into the final result set. Our framework guarantees the correctness...
The Atmospheric Radiation Measurement (ARM) Climate Research Facility (www.arm.gov) provides atmospheric observations from diverse climatic regimes around the world. Currently, ARM archives over 22 million user assessable data files, primarily stored in NetCDF file format, with total data volumes close to one Petabyte. In this paper, we will discuss how ARM is currently storing, distributing, cataloging...
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...
In this paper we propose a two-stage algorithm for robust K-subspaces recovery. In the first stage, a large number of local candidate subspaces are generated by probabilistic farthest insertion, and then the initial near-optimal K-subspaces are solved by combinatorial selection with randomized greedy method. In the second stage, the K-subspaces are further refined by assigning each data vector to...
Social media captures voice of customers at a rapid pace. Consumer perception of a brand is crucial to its success. Current techniques for measuring brand perception using lengthy surveys of handpicked users in person, by mail, phone or online are time consuming and increasingly inadequate. A more effective technique to measure brand perception is to interpret customer voice directly from social media...
The citizen considers that data source collecting by the government can be released for more diversity usage. However, to archive the open data dream, sensitive data potentially could be published after the proper privacy preserving processing. In this paper, we present a scalable privacy preserving system for open/big data which leverages K-anonymity algorithm and Hadoop framework. We use an experiment...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.