Dixin Tang

chapter

SparkArray: An Array-Based Scientific Data Management System Built on Apache Spark

Wenjuan Wang, Taoying Liu, Dixin Tang, Hong Liu, more

2016 IEEE International Conference on Networking, Architecture and Storage (NAS) > 1 - 10

2016 IEEE International Conference on Networking, Architecture and Storage (NAS)

With the highly demanded requirements for manipulating large scientific datasets, scientists are in need of flexible cluster-level software to execute fast scientific data analysis. In this paper, we discuss whether the Apache Spark framework is suitable for scientific data management. We present our system SparkArray, which extends Spark with a multidimensional array data model and a set of common...

chapter

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu, more

2016 IEEE International Congress on Big Data (BigData Congress) > 91 - 100

2016 IEEE International Congress on Big Data (BigData Congress)

Current major big data analytical stacks often consist of a general-purpose, multi-staged cluster computation framework (e.g. Hadoop) and a SQL query execution system (e.g. Hive) on its top. In such stacks, a key factor of query execution performance is the efficiency of data shuffling between two execution stages (e.g. Map/Reduce). However, current stacks often execute data shuffling in a data-oblivious...

chapter

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

Dixin Tang, Taoying Liu, Rubao Lee, Hong Liu, more

2015 IEEE International Conference on Cluster Computing > 70 - 73

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Current major big data analytical stacks often consist of a general purpose, multi-staged computation framework (e.g. Hadoop) and an SQL query system (e.g. Hive) on its top. A key factor of query performance is the efficiency of data shuffling between two execution stages (e.g. Map/Reduce). In current data shuffling, various useful information about the shuffled data and the query on the data is simply...

chapter

RHJoin: A fast and space-efficient join method for log processing in MapReduce

Dixin Tang, Taoying Liu, Hong Liu, Wei Li

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS) > 975 - 980

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)

Equi-join is heavily used in MapReduce-based log processing. With the rapid growth of dataset sizes, join methods on MapReduce are extensively studied recently. We find that existing join methods usually cannot get high query performance and affordable storage consumption at the same time when faced with a huge amount of log data. They either only optimize one aspect but significantly sacrifice the...

chapter

Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy

Liang Li, Dixin Tang, Taoying Liu, Hong Liu, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1735 - 1745

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Cross-matching in astronomy is a basic procedure for comprehensibly analyzing the relations among different celestial objects. The aim is to search celestial objects in different catalogs and to determine if they are the same object. Basically, cross-matching can be expressed as a join query statement. Since celestial catalogs usually contain billion of stars, the join operator must be carefully designed...

INFONA - science communication portal

Search results for: Dixin Tang

SparkArray: An Array-Based Scientific Data Management System Built on Apache Spark

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

RHJoin: A fast and space-efficient join method for log processing in MapReduce

Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Dixin Tang

SparkArray: An Array-Based Scientific Data Management System Built on Apache Spark

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

A Case Study of Optimizing Big Data Analytical Stacks Using Structured Data Shuffling

RHJoin: A fast and space-efficient join method for log processing in MapReduce

Optimizing the Join Operation on Hive to Accelerate Cross-Matching in Astronomy

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options