Search results

Items from 1 to 20 out of 79 results

chapter

keybin: Key-Based Binning for Distributed Clustering

Xinyu Chen, Jeremy Benson, Trilce Estrada

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 572 - 581

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Traditional machine learning algorithms often require computations on centralized data, but modern datasets are collected and stored in a distributed way. In addition to the cost of moving data to centralized locations, increasing concerns about privacy and security warrant distributed approaches. We propose keybin, a distributed key-based binning clustering algorithm for high-dimensional spaces....

chapter

Improved Statistical Analysis Method Based on Big Data Technology

Hongsheng Xu, Ganglong Fan, Ke Li

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA) > 175 - 179

2017 International Conference on Computer Network, Electronic and Automation (ICCNEA)

Big data technology refers to the rapid acquisition of valuable information from various types of large amounts of data. It can be divided into 8 technologies: data acquisition, data access, infrastructure, data processing, statistical analysis, data mining, model prediction and results presentation. The paper presents improved statistical analysis method based on big data technology. A statistical...

chapter

A Coflow-Based Co-Optimization Framework for High-Performance Data Analytics

Long Cheng, Ying Wang, Yulong Pei, Dick Epema

2017 46th International Conference on Parallel Processing (ICPP) > 392 - 401

2017 46th International Conference on Parallel Processing (ICPP)

Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved...

article

Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time

Peng Li, Song Guo, Toshiaki Miyazaki, Xiaofei Liao, more

IEEE Transactions on Parallel and Distributed Systems > 2017 > 28 > 6 > 1785 - 1796

Big data analytics has attracted close attention from both industry and academic because of its great benefits in cost reduction and better decision making. As the fast growth of various global services, there is an increasing need for big data analytics across multiple data centers (DCs) located in different countries or regions. It asks for the support of a cross-DC data processing platform optimized...

chapter

Bigdata analysis and comparison of bigdata analytic approches

Shweta Malhotra, M.N Doja, Bashir Alam, Mansaf Alam

2017 International Conference on Computing, Communication and Automation (ICCCA) > 309 - 314

2017 International Conference on Computing, Communication and Automation (ICCCA)

Recent technological advancements in typical domains (e.g. internet, financial companies, health care, user generated data, supply chain systems etc.) have directed to inundate of data from these domains. Data outburst trend gave the insight meaning to the buzz word ‘Bigdata’. If we compare with traditional data, Bigdata exhibits some unique characteristics like it is commonly enormous and unstructured...

chapter

GPU in-Memory Processing Using Spark for Iterative Computation

Sumin Hong, Woohyuk Choi, Won-Ki Jeong

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 31 - 41

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Due to its simplicity and scalability, MapReduce has become a de facto standard computing model for big data processing. Since the original MapReduce model was only appropriate for embarrassingly parallel batch processing, many follow-up studies have focused on improving the efficiency and performance of the model. Spark follows one of these recent trends by providing in-memory processing capability...

chapter

Big Data Analysis: Structuring of Data

Riya Ojha, Rakshit Singh, Aditya Singh

2017 International Conference on Technical Advancements in Computers and Communications (ICTACC) > 144 - 147

2017 International Conference on Technical Advancements in Computers and Communications (ICTACC)

In the emerging field of big data, a large volume of data has to be managed, operating on data of huge volume becomes easier when it's sorted and structured. The data can be structured using a simple algorithm i.e. index algorithm which stores and categories data on basis of their application. This in turn will be very beneficial on business level as well as on software level.

chapter

PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers

Abdulaziz Albatli, David McKee, Paul Townend, Lydia Lau, more

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService) > 233 - 242

2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService)

Cloud Computing allows users to control substantial computing power for complex data processing, generating huge and complex data. However, the virtual resources requested by users are rarely utilized to their full capacities. To mitigate this, providers often perform over-commitment to maximize profit, which can result in node overloading and consequent task eviction. This paper presents a novel...

chapter

Some key problems of data management in army data engineering based on big data

Xiao HongJu, Wang Fei, Wang FenMei, Wang XiuZhen

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( > 149 - 152

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)

This paper analyzed the challenges of data management in army data engineering, such as big data volume, data heterogeneous, high rate of data generation and update, high time requirement of data processing, and widely separated data sources. We discussed the disadvantages of traditional data management technologies to deal with these problems. We also highlighted the key problems of data management...

chapter

Leveraging big data to combat terrorism in developing countries

Taiwo Kolajo, Olawande Daramola

2017 Conference on Information Communication Technology and Society (ICTAS) > 1 - 6

2017 Conference on Information Communication Technology and Society (ICTAS)

Terrorism is a matter of great concern in many nations because of its impact on sustainable development, which is critical for developing countries. Efforts on the part of security agencies need to stay a step ahead of threats of terrorism to effectively prevent their occurrence. Many research efforts that sought to combat terrorism using big data have been reported in the literature. However, most...

chapter

Performance evaluation of NewSQL databases

Karambir Kaur, Monika Sachdeva

2017 International Conference on Inventive Systems and Control (ICISC) > 1 - 5

2017 International Conference on Inventive Systems and Control (ICISC)

For over forty years, relational databases have been the leading model for data storage, retrieval and management. However, due to increasing needs for scalability and performance, alternative systems have emerged, namely NewSQL technology. NewSQL is a class of modern relational database management systems (RDBMS) that provide the same scalable performance of NoSQL systems for online transaction processing...

chapter

Distributed data augmented support vector machine on Spark

Tu Dinh Nguyen, Vu Nguyen, Trung Le, Dinh Phung

2016 23rd International Conference on Pattern Recognition (ICPR) > 498 - 503

2016 23rd International Conference on Pattern Recognition (ICPR)

Support vector machines (SVMs) are widely-used for classification in machine learning and data mining tasks. However, they traditionally have been applied to small to medium datasets. Recent need to scale up with data size has attracted research attention to develop new methods and implementation for SVM to perform tasks at scale. Distributed SVMs are relatively new and studied recently, but the distributed...

chapter

Research on MapReduce-Based Fuzzy Associative Classifier for Big Probabilistic Numerical Data

Bin Pei, Fenmei Wang, Xiuzhen Wang

2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) > 903 - 906

As we move into big data era, bigprobabilistic numerical data are prevalent because they occur in many modern applications, including sensor databases, spatial-temporal databases, biology information systems, etc. However, traditional associative classifier methods proposed to handle probabilistic numerical data are not suitable for bigprobabilistic numerical data because of the memory usage, parallelization,...

chapter

Big data: A review of analytics methods & techniques

Yojna Arora, Dinesh Goyal

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) > 225 - 230

2016 2nd International Conference on Contemporary Computing and Informatics (IC3I)

In the current scenario, data is considered to be the biggest assets. One who has maximum relevant data is considered to be rich in the information industry. But only the collection of data is not enough, it needs to be analyzed. This huge amount of data which is termed ass Big Data cannot be analyzed by traditional tools and techniques, rather it requires more advanced Techniques which can make data...

chapter

Digree: A middleware for a graph databases polystore

Vasilis Spyropoulos, Christina Vasilakopoulou, Yannis Kotidis

2016 IEEE International Conference on Big Data (Big Data) > 2580 - 2589

2016 IEEE International Conference on Big Data (Big Data)

In this paper we present Digree, an experimental middleware system that can execute graph pattern matching queries over databases hosting voluminous graph datasets. First, we formally present the employed data model and the processes of re-writing a query into an equivalent set of subqueries and subsequently combining the partial results into the final result set. Our framework guarantees the correctness...

chapter

Next-gen tools for big scientific data: ARM data center example

Ranjeet Devarakonda, Kyle Dumas, Sheman Beus, Everett Rush, more

2016 IEEE International Conference on Big Data (Big Data) > 3968 - 3970

2016 IEEE International Conference on Big Data (Big Data)

The Atmospheric Radiation Measurement (ARM) Climate Research Facility (www.arm.gov) provides atmospheric observations from diverse climatic regimes around the world. Currently, ARM archives over 22 million user assessable data files, primarily stored in NetCDF file format, with total data volumes close to one Petabyte. In this paper, we will discuss how ARM is currently storing, distributing, cataloging...

chapter

Addressing the big-earth-data variety challenge with the hierarchical triangular mesh

Michael L. Rilee, Kwo-Sen Kuo, Thomas Clune, Amidu Oloso, more

2016 IEEE International Conference on Big Data (Big Data) > 1006 - 1011

2016 IEEE International Conference on Big Data (Big Data)

We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...

chapter

Robust K-subspaces recovery with combinatorial initialization

Jun He, Yue Zhang, Jiye Wang, Nan Zeng, more

2016 IEEE International Conference on Big Data (Big Data) > 3573 - 3582

2016 IEEE International Conference on Big Data (Big Data)

In this paper we propose a two-stage algorithm for robust K-subspaces recovery. In the first stage, a large number of local candidate subspaces are generated by probabilistic farthest insertion, and then the initial near-optimal K-subspaces are solved by combinatorial selection with randomized greedy method. In the second stage, the K-subspaces are further refined by assigning each data vector to...

chapter

DERIV: Distributed In-Memory Brand Perception Tracking Framework

Manu Shukla, Andrew Fong, Raimundo Dos Santos, Chang-Tien Lu

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 387 - 393

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Social media captures voice of customers at a rapid pace. Consumer perception of a brand is crucial to its success. Current techniques for measuring brand perception using lengthy surveys of handpicked users in person, by mail, phone or online are time consuming and increasingly inadequate. A more effective technique to measure brand perception is to interpret customer voice directly from social media...

chapter

A Scalable Privacy Preserving System for Open Data

Chao-Chun Yeh, Pang-Chieh Wang, Yu-Hsuan Pan, Ming-Chih Kao, more

2016 International Computer Symposium (ICS) > 312 - 317

2016 International Computer Symposium (ICS)

The citizen considers that data source collecting by the government can be released for more diversity usage. However, to archive the open data dream, sensitive data potentially could be published after the proper privacy preserving processing. In this paper, we present a scalable privacy preserving system for open/big data which leverages K-anonymity algorithm and Hadoop framework. We use an experiment...

Keywords:
BIG DATA
DISTRIBUTED DATABASES

Publication date

Set your own date range

Publication type

book (68)
article (11)

Keywords

COMPUTATIONAL MODELING (16)
CLOUD COMPUTING (14)
DATA MINING (10)
HADOOP (10)
MAPREDUCE (9)
ALGORITHM DESIGN AND ANALYSIS (8)
ANALYTICAL MODELS (8)
DATA ANALYSIS (7)
REAL-TIME SYSTEMS (7)
SCALABILITY (7)
COMPUTER ARCHITECTURE (6)
NOSQL (6)
RELATIONAL DATABASES (6)
SPARKS (6)
INDEXES (5)
BANDWIDTH (4)
BUSINESS (4)
DATA MANAGEMENT (4)
DISTRIBUTED COMPUTING (4)
MONITORING (4)
ORGANIZATIONS (4)
SERVERS (4)
CLUSTERING ALGORITHMS (3)
DATA PRIVACY (3)
DATA TRANSFER (3)
ENGINES (3)
FILE SYSTEMS (3)
GOOGLE (3)
INDUSTRIES (3)
MARKET RESEARCH (3)
OPTIMIZATION (3)
PREDICTIVE MODELS (3)
PROGRAMMING (3)
RESOURCE DESCRIPTION FRAMEWORK (3)
SOFTWARE (3)
SPARK (3)
SUPPORT VECTOR MACHINES (3)
THROUGHPUT (3)
APACHE SPARK (2)
ARRAYS (2)
AUTOMATION (2)
BIG DATA ANALYTICS (2)
BIGDATA (2)
CLOUD (2)
COMMUNITIES (2)
COMPANIES (2)
COMPUTERS (2)
CONTEXT (2)
CRYPTOGRAPHY (2)
DATA MODEL (2)
DATA MODELING (2)
DATA PLACEMENT (2)
DATA REPLICATION (2)
DISTRIBUTED SYSTEMS (2)
ESTIMATION (2)
FACEBOOK (2)
FEATURE EXTRACTION (2)
HDFS (2)
IN-MEMORY COMPUTING (2)
INDEXING (2)
INTERNET OF THINGS (2)
LIBRARIES (2)
MAP-REDUCE (2)
MATHEMATICAL MODEL (2)
MEDIA (2)
METEOROLOGY (2)
NOSQL DATABASE (2)
PRIVACY (2)
PRIVACY PRESERVING (2)
PROBABILISTIC LOGIC (2)
PROPOSALS (2)
PROVENANCE (2)
ROBUSTNESS (2)
SCHEDULING (2)
SCIENTIFIC DATA (2)
SCIENTIFIC WORKFLOWS (2)
SPARSE MATRICES (2)
SQL (2)
STANDARDS (2)
SUPPORT VECTOR MACHINE (2)
TAXONOMY (2)
VIRTUAL MACHINING (2)
ACCESS CONTROL (1)
AEROSPACE INDUSTRY (1)
AIRCRAFT (1)
ALGORITHM (1)
ALLOCATON (1)
ANALYTIC (1)
ANALYTICS (1)
ANOMALOUS DATA IDENTIFICATION INSERT (1)
APACHE CASSANDRA (1)
ARM ARCHIVE (1)
ARMY DATA ENGINEERING (1)
ARRAY DATABASE (1)
ATMOSPHERIC MEASUREMENTS (1)
AUTHENTICATION (1)
AVAILABILITY (1)
more

INFONA - science communication portal

Search results

keybin: Key-Based Binning for Distributed Clustering

Improved Statistical Analysis Method Based on Big Data Technology

A Coflow-Based Co-Optimization Framework for High-Performance Data Analytics

Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time

Bigdata analysis and comparison of bigdata analytic approches

GPU in-Memory Processing Using Spark for Iterative Computation

Big Data Analysis: Structuring of Data

PROV-TE: A Provenance-Driven Diagnostic Framework for Task Eviction in Data Centers

Some key problems of data management in army data engineering based on big data

Leveraging big data to combat terrorism in developing countries

Performance evaluation of NewSQL databases

Distributed data augmented support vector machine on Spark

Research on MapReduce-Based Fuzzy Associative Classifier for Big Probabilistic Numerical Data

Big data: A review of analytics methods & techniques

Digree: A middleware for a graph databases polystore

Next-gen tools for big scientific data: ARM data center example

Addressing the big-earth-data variety challenge with the hierarchical triangular mesh

Robust K-subspaces recovery with combinatorial initialization

DERIV: Distributed In-Memory Brand Perception Tracking Framework

A Scalable Privacy Preserving System for Open Data

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options