The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The most widely adopted approach for knowledge extraction from raw data generated at the edges of the Internet (e.g., by IoT or personal mobile devices) is through global cloud platforms, where data is collected from devices, and analysed. However, with the increasing number of devices spread in the physical environment, this approach rises several concerns. The data gravity concept, one of the basis...
Recent technological advancements in typical domains (e.g. internet, financial companies, health care, user generated data, supply chain systems etc.) have directed to inundate of data from these domains. Data outburst trend gave the insight meaning to the buzz word ‘Bigdata’. If we compare with traditional data, Bigdata exhibits some unique characteristics like it is commonly enormous and unstructured...
Fast data analytics at an increasingly large scale has become a critical task in any Internet service company. For example, in Baidu, the major search engine company in China, large volumes of Web and business data in PB-scale are timely and constantly acquired and analyzed for the purposes of evaluating product revenue, tracking product demanding activities on market, predicting user behavior, upgrading...
This paper analyzed the challenges of data management in army data engineering, such as big data volume, data heterogeneous, high rate of data generation and update, high time requirement of data processing, and widely separated data sources. We discussed the disadvantages of traditional data management technologies to deal with these problems. We also highlighted the key problems of data management...
Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computation, which leads to communication bottlenecks. The focus of this paper is on formalizing and...
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...
Driven by the trends of BigData and Cloud computing, there is a growing demand for processing and analyzing data that are generated and stored across geo-distributed data centers. However, due to the limited network bandwidth between data centers and the growing data volume spread across different locations, it has become increasingly inefficient to aggregate data and to perform computations at a...
Big data applications that rely on relational databases gradually expose limitations on scalability and performance. In recent years, Hadoop ecosystem has been widely adopted as an evolving solution. This paper presents the migration of a legacy data analytics application in a provincial data center. The target platform follows "no one size fits all" method. Considering different workloads,...
An exponential growth in the availability of data from various sources has enabled large scale adoption of data driven decision making. Much of present day's data was generated in the recent years, complementing to this there has been a substantial reduction in data storage costs. Hence the analysis of data collected will assist decision making in our future smart environments. Here sustainability...
The vigorous growth of big data has triggered both opportunities and challenges in business and industry. However, Web big data distributed in diverse sources with multiple data structures frequently conflict with each other, i.e. inconsistency in cross-source Web big data. In this paper, we propose a state-of-the-art architecture of auto-discovering inconsistency with Web big data. Our contributions...
In their quest for data-driven insight, firms align their resources to produce information that is actionable. Moreover, the bundling and utilization of these valuable resources is what defines an organizational capability. Thus, in this paper we conceptualize a new type of capability - data analytics capabilities, DAC, as the ability to assemble, coordinate, mobilize, and deploy analytics-based resources...
Digital Library large data resource lack of analysis and use, in order to mining the value of big data resources, proposed platformization analysis and processing mode. By integrate R and Hadoop to construct distributed data analysis platform, many big data analytical can be decomposed into "large" and "small" data processing section, overcome before scheme puzzle on analytical...
The analysis of data is typically accompanied by concern as to the correctness of recorded data points; some of the points might be contaminated, thereby distorting the result of the analysis. This paper proposes a novel cluster-based and distribution-independent method for outlier detection. Based on Monte Carlo simulations, the new method is tested with different data distributions and compared...
More varied data channels, increasingly diverse analytic methods, and new deployment models--along with some fundamental technology shifts--will significantly impact the next generation of big data systems.
Data mining is a system that brings up the light to hidden and valuable information from the data and the facts revealed by data mining which were previously not known, theoretically useful, and of high quality. Data mining offers a means by which we can explores the knowledge in database. Data stream mining and finding outliers are dynamic research areas of data mining. It is thought that ‘data stream...
According to data volumes in scientific applications have grown exponentially, new scientific methods to analyze and organize the data are required. MapReduce programming is driving Internet services and those services operation in a cloud environment. Hence it is required to efficiently provide resources for handling diverse MapReduce applications. In this paper we show the Hadoop application with...
With the high development of Internet, e-commerce websites now routinely have to work with log datasets which are up to a few terabytes in size. How to remove messy data timely with low cost and find out useful information is a problem we have to face. The mining process involves several steps from pre-processing the raw data to establishing the final models. In this paper we describe our method to...
Earth and environmental scientists collect and use a wide range of observational data. This data often exhibits high structural and semantic heterogeneity due to the variety of data collected and the ways in which observational datasets are structured in practice. However, to address questions at broad temporal, geographic, and biological scales, researchers often need to access and combine data from...
User-friendliness and performance are important properties of data mining and analysis tools. In this demo, we introduced an agent-based distributed data mining platform that allows users to manage and share the data-mining-related resources conveniently. Furthermore, the platform employs agents for workflow enactment in which the performance is improved with agent abilities. We also present an example...
The Web has been flooded with highly heterogeneous data sources that freely offer their data to the public. Careful design and compliance to standards is a way to cope with the heterogeneity. However, any agreement and compliance is practically hard to achieve across different communities. In this work we describe a framework that enables the exploitation of content across different scientific disciplines...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.