Yi Pan

chapter

Effective Multi-stream Joining in Apache Samza Framework

Zhenyun Zhuang, Tao Feng, Yi Pan, Haricharan Ramachandra, more

2016 IEEE International Congress on Big Data (BigData Congress) > 267 - 274

2016 IEEE International Congress on Big Data (BigData Congress)

Increasing adoption of Big Data in business environments have driven the needs of stream joining in realtime fashion. Multi-stream joining is an important stream processing type in today's Internet companies, and it has been used to generate higher-quality data in business pipelines. Multi-stream joining can be performed in two models: (1) All-In-One (AIO) Joining and (2) Step-By-Step (SBS) Joining...

chapter

SamzaSQL: Scalable Fast Data Management with Streaming SQL

Milinda Pathirage, Julian Hyde, Yi Pan, Beth Plale

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1627 - 1636

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

As the data-driven economy evolves, enterprises have come to realize a competitive advantage in being able to act on high volume, high velocity streams of data. Technologies such as distributed message queues and streaming processing platforms that can scale to thousands of data stream partitions on commodity hardware are a response. However, the programming API provided by these systems is often...

chapter

A memory capacity model for high performing data-filtering applications in Samza framework

Tao Feng, Zhenyun Zhuang, Yi Pan, Haricharan Ramachandra

2015 IEEE International Conference on Big Data (Big Data) > 2600 - 2605

2015 IEEE International Conference on Big Data (Big Data)

Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using...

chapter

PLAR: Parallel Large-Scale Attribute Reduction on Cloud Systems

Junbo Zhang, Tianrui Li, Yi Pan

2013 International Conference on Parallel and Distributed Computing, Applications and Technologies > 184 - 191

2013 International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Attribute reduction for big data is viewed as an important preprocessing step in the areas of pattern recognition, machine learning and data mining. In this paper, a novel parallel method based on MapReduce for large-scale attribute reduction is proposed. By using this method, several representative heuristic attribute reduction algorithms in rough set theory have been parallelized. Further, each...

INFONA - science communication portal

Search results for: Yi Pan

Effective Multi-stream Joining in Apache Samza Framework

SamzaSQL: Scalable Fast Data Management with Streaming SQL

A memory capacity model for high performing data-filtering applications in Samza framework

PLAR: Parallel Large-Scale Attribute Reduction on Cloud Systems

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Yi Pan

Effective Multi-stream Joining in Apache Samza Framework

SamzaSQL: Scalable Fast Data Management with Streaming SQL

A memory capacity model for high performing data-filtering applications in Samza framework

PLAR: Parallel Large-Scale Attribute Reduction on Cloud Systems

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options