Search results for: Sangmin Seo

Items from 1 to 20 out of 20 results

chapter

GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations

Adrian Castello, Sangmin Seo, Rafael Mayo, Pavan Balaji, more

2017 46th International Conference on Parallel Processing (ICPP) > 60 - 69

2017 46th International Conference on Parallel Processing (ICPP)

OpenMP is the de facto standard application programming interface (API) for on-node parallelism. The most popular OpenMP runtimes rely on POSIX threads (pthreads) implementations that offer an excellent performance for coarse-grained parallelism and match perfectly with the current hardware. However, a recent trend in runtimes/applications points in the direction of leveraging massive on-node parallelism...

chapter

Advanced Thread Synchronization for Multithreaded MPI Implementations

Hoang-Vu Dang, Sangmin Seo, Abdelhalim Amer, Pavan Balaji

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 314 - 324

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Concurrent multithreaded access to the Message Passing Interface (MPI) is gaining importance to support emerging hybrid MPI applications. The interoperability between threads and MPI, however, is complex and renders efficient implementations nontrivial. Prior studies showed that threads waiting for communication progress (waiting threads) often interfere with others (active threads) and degrade their...

chapter

A software-SVM-based transactional memory for multicore accelerator architectures with local memory

Jun Lee, Sangmin Seo, Jaejin Lee

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 567 - 568

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a software transactional memory (STM) for heterogeneous multicores with small local memory. The heterogeneous multicore architecture consists of a general-purpose processor element (GPE) and multiple accelerator processor elements (APEs). The GPE is typically backed by a deep, on-chip cache hierarchy and hardware cache coherence. On the other hand, the APEs have small, explicitly addressed...

chapter

An OpenCL framework for heterogeneous multicores with local memory

Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, more

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 193 - 204

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our OpenCL runtime...

chapter

COMIC: A coherent shared memory interface for cell BE

Jaejin Lee, Sangmin Seo, Chihun Kim, Junghyun Kim, more

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 303 - 314

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

The Cell BE processor is a heterogeneous multicore that contains one PowerPC Processor Element (PPE) and eight Synergistic Processor Elements (SPEs). Each SPE has a small software-managed local store. Applications must explicitly control all DMA transfers of code and data between the SPE local stores and the main memory, and they must perform any coherence actions required for data transferred. The...

chapter

A Review of Lightweight Thread Approaches for High Performance Computing

Adrian Castello, Antonio J. Pena, Sangmin Seo, Rafael Mayo, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 471 - 480

2016 IEEE International Conference on Cluster Computing (CLUSTER)

High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for...

chapter

SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale

Jintao Meng, Sangmin Seo, Pavan Balaji, Yanjie Wei, more

2016 45th International Conference on Parallel Processing (ICPP) > 195 - 204

2016 45th International Conference on Parallel Processing (ICPP)

In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with sequencing data ranging from terabyes to petabytes. Performance analysis results show that the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging)...

chapter

Systemwide Power Management with Argo

Daniel Ellsworth, Tapasya Patki, Swann Perarnau, Sangmin Seo, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1118 - 1121

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The Argo project is a DOE initiative for designing a modular operating system/runtime for the next generation of supercomputers. A key focus area in this project is power management, which is one of the main challenges on the path to exascale. In this paper, we discuss ideas for systemwide power management in the Argo project. We present a hierarchical and scalable approach to maintain a power bound...

chapter

MPI+ULT: Overlapping Communication and Computation with User-Level Threads

Huiwei Lu, Sangmin Seo, Pavan Balaji

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 444 - 454

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

As the core density of future processors keeps increasing, MPI+Threads is becoming a promising programming model for large scale SMP clusters. Generally speaking, hybrid MPI+Threads runtime can largely improve intra-node parallelism and data sharing on shared-memory architectures. However, it does not help much on inter-node communication due to the inefficient integration of existing communication...

article

A Performance Model for GPUs with Caches

Thanh Tuan Dao, Jungwon Kim, Sangmin Seo, Bernhard Egger, more

IEEE Transactions on Parallel and Distributed Systems > 2015 > 26 > 7 > 1800 - 1813

To exploit the abundant computational power of the world’s fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model,...

chapter

Lessons Learned Implementing User-Level Failure Mitigation in MPICH

Wesley Bland, Huiwei Lu, Sangmin Seo, Pavan Balaji

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1123 - 1126

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

User-level failure mitigation (ULFM) is becoming the front-running solution for process fault tolerance in MPI. While not yet adopted into the MPI standard, it is being used by applications and libraries and is being considered by the MPI Forum for future inclusion into MPI itself. In this paper, we introduce an implementation of ULFM in MPICH, a high-performance and widely portable implementation...

chapter

Implementation and Evaluation of MPI Nonblocking Collective I/O

Sangmin Seo, Robert Latham, Junchao Zhang, Pavan Balaji

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1084 - 1091

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

The well-known gap between relative CPU speeds and storage bandwidth results in the need for new strategies for managing I/O demands. In large-scale MPI applications, collective I/O has long been an effective way to achieve higher I/O rates, but it poses two constraints. First, although overlapping collective I/O and computation represents the next logical step toward a faster time to solution, MPI's...

chapter

SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores -- Practice and Experience

Jintao Meng, Yanjie Wei, Sangmin Seo, Pavan Balaji

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 769 - 772

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

There is widening gap between the throughput of massive parallel sequencing machines and the ability to analyze these huge sequencing data, which can be Tara bytes or even Peta bytes. Previously our assembly tool, SWAP-Assembler, can scale to 2048 cores on TianHe 1A for human Yanhuang genome. This work is to further scale SWAP-Assembler to millions of cores on Mira. SWAP-Assembler can be divided into...

chapter

Automatic OpenCL work-group size selection for multicore CPUs

Sangmin Seo, Jun Lee, Gangwon Jo, Jaejin Lee

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques > 387 - 397

2013 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)

In this paper, we address the effect of the work-group size on the performance of OpenCL kernels. We propose a profiling-based algorithm that finds a good work-group size, in terms of performance, for the target multicore CPU architecture. Our algorithm reduces misses in the private L1 data cache and achieves load balancing between cores. It exploits the polyhedral model to estimate the working-set...

chapter

A study of secure data transmissions in mobile cloud computing from the energy consumption side

Jin-A Hong, Sangmin Seo, Namgi Kim, Byoung-Dai Lee

The International Conference on Information Networking 2013 (ICOIN) > 250 - 255

2013 International Conference on Information Networking (ICOIN)

For mobile cloud computing, one of the key issues is to minimize energy consumption in data communication. Although many studies have examined energy consumption of data transmissions, they are limited in that they have focused mainly on bitstream transmissions over existing 3G networks or Wi-Fi environments. Thus, the present paper explores energy efficiency of mobile devices when transferring data...

chapter

Performance characterization of the NAS Parallel Benchmarks in OpenCL

Sangmin Seo, Gangwon Jo, Jaejin Lee

2011 IEEE International Symposium on Workload Characterization (IISWC) > 137 - 148

2011 IEEE International Symposium on Workload Characterization (IISWC)

Heterogeneous parallel computing platforms, which are composed of different processors (e.g., CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing domains. With this trend, parallel programming models need to achieve portability across different processors as well as high performance with reasonable programming effort. OpenCL (Open Computing Language) is an open standard and...

chapter

An OpenCL Framework for Homogeneous Manycores with No Hardware Cache Coherence

Jun Lee, Jungwon Kim, Junghyun Kim, Sangmin Seo, more

2011 International Conference on Parallel Architectures and Compilation Techniques > 56 - 67

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Recently, Intel has introduced a research prototype many core processor called the Single-chip Cloud Computer (SCC). The SCC is an experimental processor created by Intel Labs. It contains 48 cores in a single chip and each core has its own L1 and L2 caches without any hardware support for cache coherence. It allows maximum 64GB size of external memory that can be accessed by all cores and each core...

chapter

SFMalloc: A Lock-Free and Mostly Synchronization-Free Dynamic Memory Allocator for Manycores

Sangmin Seo, Junghyun Kim, Jaejin Lee

2011 International Conference on Parallel Architectures and Compilation Techniques > 253 - 263

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

As parallel programming becomes the mainstream due to multicore processors, dynamic memory allocators used in C and C++ can suppress the performance of multi-threaded applications if they are not scalable. In this paper, we present a new dynamic memory allocator for multi-threaded applications. The allocator never uses any synchronization for common cases. It uses only lock-free synchronization mechanisms...

chapter

COMIC++: A software SVM system for heterogeneous multicore accelerator clusters

Jaejin Lee, Jun Lee, Sangmin Seo, Jungwon Kim, more

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture > 1 - 12

2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA)

In this paper, we propose a software shared virtual memory (SVM) system for heterogeneous multicore accelerator clusters with explicitly managed memory hierarchies. The target cluster consists of a single manager node and many compute nodes. The manager node contains a generalpurpose processor and larger main memory, and each compute node contains a heterogeneous multicore processor and smaller main...

chapter

Design and implementation of software-managed caches for multicores with local memory

Sangmin Seo, Jaejin Lee, Z. Sura

2009 IEEE 15th International Symposium on High Performance Computer Architecture > 55 - 66

HPCA - 15 2009. IEEE 15th International Symposium on High Performance Computer Architecture

Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies from different types of memory accesses add overhead and adversely affect instruction scheduling. Instead, the accelerator cores have internal local memory to place their code and data. Programmers of such heterogeneous multicore...

Filter options

Publication date

Set your own date range

Publication type

book (19)
article (1)

Keywords

RUNTIME (8)
OPENMP (5)
COMPUTATIONAL MODELING (4)
HARDWARE (4)
INSTRUCTION SETS (4)
KERNEL (4)
LIBRARIES (4)
SYNCHRONIZATION (4)
COHERENCE (3)
INDEXES (3)
MESSAGE SYSTEMS (3)
MULTICORE PROCESSING (3)
OPENCL (3)
OPTIMIZATION (3)
SOFTWARE (3)
BANDWIDTH (2)
BENCHMARK TESTING (2)
CACHE STORAGE (2)
COMPILERS (2)
COMPUTER ARCHITECTURE (2)
CONTEXT (2)
HETEROGENEOUS MULTICORES (2)
LIGHTWEIGHT THREADS (2)
MEMORY CONSISTENCY (2)
MPI (2)
PERFORMANCE OPTIMIZATION (2)
PROGRAMMING (2)
PROGRAMMING MODELS (2)
PROTOCOLS (2)
RESOURCE MANAGEMENT (2)
SOFTWARE SHARED VIRTUAL MEMORY (2)
SOFTWARE-MANAGED CACHES (2)
STANDARDS (2)
4G NETWORKS (1)
APE (1)
ARRAYS (1)
ASSEMBLY (1)
AUTOMATIC SELECTION (1)
BIOINFORMATICS (1)
CACHE COHERENCE (1)
CACHING (1)
CELL BE (1)
CELL BE PROCESSORS (1)
CLOUD COMPUTING (1)
CONCURRENT COMPUTING (1)
CONCURRENT MEMORY ALLOCATOR (1)
COPPER (1)
DATA MODELS (1)
DATA STRUCTURES (1)
DYNAMIC MEMORY MANAGEMENT (1)
ENERGY EFFICIENCY (1)
ESTIMATION (1)
EXTENDED GENERALIZED REQUEST (1)
EXTENDED SET-INDEX CACHE (1)
FASTENERS (1)
FAULT TOLERANCE (1)
FAULT TOLERANT SYSTEMS (1)
GENERAL PURPOSE PROCESSOR ELEMENT (1)
GENOME ASSEMBLER (1)
GENOME ASSEMBLY (1)
GENOMICS (1)
GLT (1)
GPE (1)
GPGPU (1)
GRAPHICS PROCESSING UNIT (1)
GRAPHICS PROCESSING UNITS (1)
HARDWARE CACHE COHERENCE (1)
HAZARDS (1)
HETEROGENEOUS MULTICORE ACCELERATOR CLUSTERS (1)
HETEROGENEOUS MULTICORE ARCHITECTURES (1)
HETEROGENEOUS MULTICORE PROCESSOR (1)
HIERARCHICAL CENTRALIZED RELEASE CONSISTENCY (1)
INSTRUCTION SCHEDULING (1)
LOCAL MEMORY (1)
LOCK (1)
LOCK-FREE (1)
MAGNETIC CORES (1)
MAIN MEMORY (1)
MANYCORE (1)
MEMORY ARCHITECTURE (1)
MEMORY MANAGEMENT (1)
MESSAGE PASSING INTERFACE (1)
MICROPROCESSOR CHIPS (1)
MICROPROCESSORS (1)
MOBILE CLOUD COMPUTING (1)
MPI I/O (1)
MPI+X (1)
MPICH (1)
MULTICORE CPU (1)
MUTEX (1)
NONBLOCKING COLLECTIVE I/O (1)
ON-CHIP CACHE HIERARCHY (1)
OVERLAPPING COMMUNICATION AND COMPUTATION (1)
PARALLEL ALGORITHMS (1)
PARALLEL PROCESSING (1)
PERFORMANCE PORTABILITY (1)
POSIX THREADS (1)
POWER MANAGEMENT (1)
PRELOAD-POSTSTORE BUFFERING (1)
PRODUCTION (1)
more

INFONA - science communication portal

Search results for: Sangmin Seo

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options