Jatin Chhugani

chapter

Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies

Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, more

2016 IEEE International Conference on Big Data (Big Data) > 204 - 213

2016 IEEE International Conference on Big Data (Big Data)

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity)...

chapter

A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

Alex Gittens, Jey Kottalam, Jiyan Yang, Michael F. Ringenburg, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1403 - 1412

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations...

chapter

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic

Ming C. Lin, Stephen Guy, Rahul Narain, Jason Sewall, more

Lecture Notes in Computer Science > Motion in Games > Crowd Simulation > 94-103

We survey some of our recent work on interactive modeling, simulation, and control of large-scale crowds and traffic for urban scenes. The driving applications of our work include real-time simulation for computer games, virtual environments, and avatar-based online 3D social networks. We also present some preliminary results and proof-of-concept demonstrations.

chapter

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

Jatin Chhugani, Changkyu Kim, Hemant Shukla, Jongsoo Park, more

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Two-point Correlation Function (TPCF) is widely used in astronomy to characterize the distribution of matter/energy in the Universe, and help derive the physics that can trace back to the creation of the universe. However, it is prohibitively slow for current sized datasets, and would continue to be a critical bottleneck with the trend of increasing dataset sizes to billions of particles and more,...

chapter

Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Pradeep Dubey

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Graph traversal is a widely used algorithm in a variety of fields, including social networks, business analytics, and high-performance computing among others. There has been a push for HPC machines to be rated not just in Petaflops, but also in "GigaTEPS" (billions of traversed edges per second), and the Graph500 benchmark has been established for this purpose. Graph traversal on single...

chapter

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Hideki Saito, more

2012 39th Annual International Symposium on Computer Architecture (ISCA) > 440 - 451

2012 ACM/IEEE 39th International Symposium on Computer Architecture (ISCA)

Current processor trends of integrating more cores with wider SIMD units, along with a deeper and complex memory hierarchy, have made it increasingly more challenging to extract performance from applications. It is believed by some that traditional approaches to programming do not apply to these modern processors and hence radical new languages must be discovered. In this paper, we question this thinking...

chapter

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency

Jatin Chhugani, Nadathur Satish, Changkyu Kim, Jason Sewall, more

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 378 - 389

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Graph-based structures are being increasingly used to model data and relations among data in a number of fields. Graph-based databases are becoming more popular as a means to better represent such data. Graph traversal is a key component in graph algorithms such as reach ability and graph matching. Since the scale of data stored and queried in these databases is increasing, it is important to obtain...

article

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing

Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, more

IEEE Micro > 2012 > 32 > 5 > 38 - 51

The DySER (Dynamically Specializing Execution Resources) architecture supports both functionality specialization and parallelism specialization. By dynamically specializing frequently executing regions and applying parallelism mechanisms, DySER provides efficient functionality and parallelism specialization. It outperforms an out-of-order CPU, Streaming SIMD Extensions (SSE) acceleration, and GPU...

chapter

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach

Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Jee Choi, Balint Joo, more

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) > 1 - 10

2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Lattice Quantum Chromo-dynamics (LQCD) is a computationally challenging problem that solves the discretized Dirac equation in the presence of an SU(3) gauge field. Its key operation is a matrix-vector product, known as the Dslash operator. We have developed a novel multicore architecture-friendly implementation of the Wilson-Dslash operator which delivers 75 Gflops (single-precision) on an Intel®...

chapter

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

Anthony Nguyen, Nadathur Satish, Jatin Chhugani, Changkyu Kim, more

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 13

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Stencil computation sweeps over a spatial grid over multiple time steps to perform nearest-neighbor computations. The bandwidth-to-compute requirement for a large class of stencil kernels is very high, and their performance is bound by the available memory bandwidth. Since memory bandwidth grows slower than compute, the performance of stencil kernels will not scale with increasing compute density...

chapter

Atomic Vector Operations on Chip Multiprocessors

S. Kumar, Daehyun Kim, M. Smelyanskiy, Yen-Kuang Chen, more

2008 International Symposium on Computer Architecture > 441 - 452

35th International Symposium on Computer Architecture

The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested through vector/SIMD instructions as well as multithreading (through both multithreaded cores and chip multiprocessors). Vector parallelism can be more efficiently supported than multithreading, but is often harder for software...

INFONA - science communication portal

Search results for: Jatin Chhugani

Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies

A Multi-Platform Evaluation of the Randomized CX Low-Rank Matrix Factorization in Spark

Interactive Modeling, Simulation and Control of Large-Scale Crowds and Traffic

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing

Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency

DySER: Unifying Functionality and Parallelism Specialization for Energy-Efficient Computing

High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

Atomic Vector Operations on Chip Multiprocessors

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results for: Jatin Chhugani

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options