2015 IEEE International Conference on Big Data (Big Data)

chapter

High quality clustering of big data and solving empty-clustering problem with an evolutionary hybrid algorithm

Jeyhun Karimov, Murat Ozbayoglu

2015 IEEE International Conference on Big Data (Big Data) > 1473 - 1478

Achieving high quality clustering is one of the most well-known problems in data mining. k-means is by far the most commonly used clustering algorithm. It converges fairly quickly, but achieving a good solution is not guaranteed. The clustering quality is highly dependent on the selection of the initial centroid selections. Moreover, when the number of clusters increases, it starts to suffer from...

chapter

Wrangler's user environment: A software framework for management of data-intensive computing system

Christopher Jordan, David Walling, Weijia Xu, Stephen A. Mock, more

2015 IEEE International Conference on Big Data (Big Data) > 2479 - 2486

2015 IEEE International Conference on Big Data (Big Data)

The growth in the capacity and capability of NAND Flash based storage systems have changed the face of data oriented computational systems. These systems have become both more capable and flexible in how they are used. With these changes comes both increased potential and user complexity. While many systems attempt to hide this complexity through the addition of more layers of storage caches, the...

chapter

Post-purchase recommendations in large-scale online marketplaces

Jayasimha Katukuri, Tolga Konik, Rajyashree Mukherjee, Santanu Kolay

2015 IEEE International Conference on Big Data (Big Data) > 1299 - 1305

2015 IEEE International Conference on Big Data (Big Data)

In this paper, we propose a new method for addressing post-purchase recommendations for a dynamic marketplace. The proposed method uses the transactional data as the primary data source to mine co-purchase relationships. The item listings from the transactional data are mapped to their static ‘cluster’ representation and a cluster-cluster directed graph is generated. Clusters have explicit definitions...

chapter

Genomic analysis with MapReduce

Wei Yi Liu, Hui-I Hsiao, Shih Yao Dai

2015 IEEE International Conference on Big Data (Big Data) > 1330 - 1335

2015 IEEE International Conference on Big Data (Big Data)

Genomic analysis [1] usually includes a pipeline of three stages: sequence alignment, data conversion, and advanced analysis. The analysis pipeline needs to handle hundreds of gigabytes of data as well as to run complex analytics algorithms, which traditionally takes long execution time (20+ hours) for a full genomes analysis. Parallelizing the execution of analytics algorithms is one way to speed...

chapter

Energy-efficient acceleration of big data analytics applications using FPGAs

Katayoun Neshatpour, Maria Malik, Mohammad Ali Ghodrat, Avesta Sasan, more

2015 IEEE International Conference on Big Data (Big Data) > 115 - 123

2015 IEEE International Conference on Big Data (Big Data)

A recent trend for big data analytics is to provide heterogeneous architectures to allow support for hardware specialization. Considering the time dedicated to create such hardware implementations, an analysis that estimates how much benefit we gain in terms of speed and energy efficiency, through offloading various functions to hardware would be necessary. This work analyzes data mining and machine...

chapter

PAUSE: A privacy architecture for heterogeneous big data environments

Dawn N. Jutla, Peter Bodorik

2015 IEEE International Conference on Big Data (Big Data) > 1919 - 1928

2015 IEEE International Conference on Big Data (Big Data)

This paper describes a functional view of a privacy architecture based on a shared-services model. The architecture exposes 7 functional management components: Master Management, Privacy Monitoring, Private Data Identification, Policy Management, Privacy Service Injection, Privacy Logging, and Privacy Analytics for (re)use by multiple applications operating in heterogeneous Big Data environments....

chapter

Performance evaluation of enabling logistic regression for big data with R

Ruizhu Huang, Weijia Xu

2015 IEEE International Conference on Big Data (Big Data) > 2517 - 2524

2015 IEEE International Conference on Big Data (Big Data)

The software package R is a free, powerful, open source software package with extensive statistical computing and graphics capabilities. Due to its high-level expressiveness and multitude of domain-specific packages, R has become a popular tool for data analysis in many scientific fields. While there are a number of packages enabling running R in parallel using message passing interface across multiple...

chapter

A Hadoop-based visualization and diagnosis framework for earth science data

Shujia Zhou, Xi Yang, Xiaowen Li, Toshihisa Matsui, more

2015 IEEE International Conference on Big Data (Big Data) > 1972 - 1977

2015 IEEE International Conference on Big Data (Big Data)

With rapidly growing computing power, ultra high-resolution Earth science simulations with a long period of time are feasible. However, it is still very challenging to distribute and analyze a huge amount of simulation results, which could be over 100TB. One key reason is that typical Earth science data are represented in NetCDF, which is not supported by the popular and powerful Hadoop Distribute...

chapter

An iterative methodology for big data management, analysis and visualization

Roberto Tardio, Alejandro Mate, Juan Trujillo

2015 IEEE International Conference on Big Data (Big Data) > 545 - 550

2015 IEEE International Conference on Big Data (Big Data)

Big Data constitutes an opportunity for companies to empower their analysis. However, at the moment there is no standard way for approaching Big Data projects. This, coupled with the complex nature of Big Data, is the cause that many Big Data projects fail or rarely obtain the expected return of investment. In this paper, we present a methodology to tackle Big Data projects in a systematic way, avoiding...

chapter

On the implementation of Zigzag codes for distributed storage system

Lijia Lu, Hui Li, Jun Chen, Bing Zhu, more

2015 IEEE International Conference on Big Data (Big Data) > 1791 - 1796

2015 IEEE International Conference on Big Data (Big Data)

Erasure codes such as Reed-Solomon (RS) codes are widely used to improve data reliability in distributed storage systems. Although erasure codes indeed greatly reduce the storage overhead compared to the replication schemes, it is still very costly in terms of network bandwidth when repairing a failed node. To address such problem, we employ the Zigzag code, a MDS array code with optimal repair property,...

chapter

Record-aware compression for big textual data analysis acceleration

Dapeng Dong, John Herbert

2015 IEEE International Conference on Big Data (Big Data) > 1183 - 1190

2015 IEEE International Conference on Big Data (Big Data)

Big data analysis technologies are becoming more widely used in industry. The ever-increasing data volume, however, puts data analytic platforms such as Hadoop under constant pressure. Several compression methods have been made available on the Hadoop platform to effectively reduce data size and efficiently deliver data between cluster nodes. In the Hadoop context, compressed data can be categorized...

chapter

Enabling scientific data storage and processing on big-data systems

Saman Biookaghazadeh, Yiqi Xu, Shujia Zhou, Ming Zhao

2015 IEEE International Conference on Big Data (Big Data) > 1978 - 1984

2015 IEEE International Conference on Big Data (Big Data)

Big-data systems are increasingly important for solving the data-driven problems in many science domains including geosciences. However, existing big-data systems cannot support the self-describing data formats such as NetCDF which are commonly used by scientific communities for data distribution and sharing. This limitation presents a serious hurdle to the further adoption of big-data systems by...

chapter

Big Data: Cloud computing in genomics applications

Hangu Yeo, Catherine H. Crawford

2015 IEEE International Conference on Big Data (Big Data) > 2904 - 2906

2015 IEEE International Conference on Big Data (Big Data)

Healthcare applications typically require big data management as well as intensive computation. This is especially true with recently developed next generation sequencing technology which increases interests in processing the huge amount of information in a timely fashion. In this paper, we focus on testing whether the healthcare applications can scale well on commercial big data platforms that implement...

chapter

Current security threats and prevention measures relating to cloud services, Hadoop concurrent processing, and big data

Ather Sharif, Sarah Cooney, Shengqi Gong, Drew Vitek

2015 IEEE International Conference on Big Data (Big Data) > 1865 - 1870

2015 IEEE International Conference on Big Data (Big Data)

Cloud services are widely used across the globe to store and analyze Big Data. These days it seems the news is full of stories about security breaches to these services, resulting in the exposure of huge amounts of private data. This paper studies the current security threats to Cloud Services, Big Data, and Hadoop. The paper analyzes a newly proposed Big Data security system based on the EnCoRe system...

chapter

Eagle: User profile-based anomaly detection for securing Hadoop clusters

Chaitali Gupta, Ranjan Sinha, Yong Zhang

2015 IEEE International Conference on Big Data (Big Data) > 1336 - 1343

2015 IEEE International Conference on Big Data (Big Data)

Existing Big data analytics platforms, such as Hadoop, lack support for user activity monitoring. Several diagnostic tools such as Ganglia, Ambari, and Cloudera Manager are available to monitor health of a cluster, however, they do not provide algorithms to detect security threats or perform user activity monitoring. Hence, there is a need to develop a scalable system that can detect malicious user...

chapter

Chronos: Failure-aware scheduling in shared Hadoop clusters

Orcun Yildiz, Shadi Ibrahim, Tran Anh Phuong, Gabriel Antoniu

2015 IEEE International Conference on Big Data (Big Data) > 313 - 318

2015 IEEE International Conference on Big Data (Big Data)

Hadoop emerged as the de facto state-of-the-art system for MapReduce-based data analytics. The reliability of Hadoop systems depends in part on how well they handle failures. Currently, Hadoop handles machine failures by re-executing all the tasks of the failed machines (i.e., executing recovery tasks). Unfortunately, this elegant solution is entirely entrusted to the core of Hadoop and hidden from...

chapter

PortHadoop: Support direct HPC data processing in Hadoop

Xi Yang, Ning Liu, Bo Feng, Xian-He Sun, more

2015 IEEE International Conference on Big Data (Big Data) > 223 - 232

2015 IEEE International Conference on Big Data (Big Data)

The success of the Hadoop MapReduce programming model has greatly propelled research in big data analytics. In recent years, there is a growing interest in the High Performance Computing (HPC) community to use Hadoop-based tools for processing scientific data. This interest is due to the facts that data movement becomes prohibitively expensive, highperformance data analytic becomes an important part...

chapter

Rewriting complex SPARQL analytical queries for efficient cloud-based processing

Padmashree Ravindra, HyeongSik Kim, Kemafor Anyanwu

2015 IEEE International Conference on Big Data (Big Data) > 32 - 37

2015 IEEE International Conference on Big Data (Big Data)

Many emerging Semantic Web applications combine and aggregate data across domains for analysis. Such analytical queries compute aggregates over multiple groupings of data, resulting in query plans with complex grouping-aggregation constraints. In the context of an RDF analytical query, each such grouping maps to a graph pattern subquery with multiple join operations, and related groups often result...

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data)

High quality clustering of big data and solving empty-clustering problem with an evolutionary hybrid algorithm

Wrangler's user environment: A software framework for management of data-intensive computing system

Post-purchase recommendations in large-scale online marketplaces

Genomic analysis with MapReduce

Energy-efficient acceleration of big data analytics applications using FPGAs

PAUSE: A privacy architecture for heterogeneous big data environments

Performance evaluation of enabling logistic regression for big data with R

A Hadoop-based visualization and diagnosis framework for earth science data

An iterative methodology for big data management, analysis and visualization

On the implementation of Zigzag codes for distributed storage system

Record-aware compression for big textual data analysis acceleration

Enabling scientific data storage and processing on big-data systems

Big Data: Cloud computing in genomics applications

Current security threats and prevention measures relating to cloud services, Hadoop concurrent processing, and big data

Eagle: User profile-based anomaly detection for securing Hadoop clusters

Chronos: Failure-aware scheduling in shared Hadoop clusters

PortHadoop: Support direct HPC data processing in Hadoop

Rewriting complex SPARQL analytical queries for efficient cloud-based processing

Filter options

Publication date

Keywords

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2015 IEEE International Conference on Big Data (Big Data)