Wenjing Ma

chapter

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor

Lijuan Jiang, Chao Yang, Yulong Ao, Wanwang Yin, more

2017 46th International Conference on Parallel Processing (ICPP) > 422 - 431

2017 46th International Conference on Parallel Processing (ICPP)

The matrix-matrix multiplication is an essential building block that can be found in various scientific and engineering applications. High-performance implementations of the matrix-matrix multiplication on state-of-the-art processors may be of great importance for both the vendors and the users. In this paper, we present a detailed methodology of implementing and optimizing the double-precision general...

chapter

Localized Fault Recovery for Nested Fork-Join Programs

Gokcen Kestor, Sriram Krishnamoorthy, Wenjing Ma

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 397 - 408

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Nested fork-join programs scheduled using work stealing can automatically balance load and adapt to changes in the execution environment. In this paper, we design an approach to efficiently recover from faults encountered by these programs. Specifically, we focus on localized recovery of the task space in the presence of fail-stop failures. We present an approach to efficiently track, under work stealing,...

chapter

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight

Yulong Ao, Chao Yang, Xinliang Wang, Wei Xue, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 535 - 544

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Stencil computation arises from a broad set of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to opti- mize stencil computation kernels on modern supercomputers with relatively high computing throughput whilst relatively low data-moving capability. This work serves as...

chapter

An integer programming framework for optimizing shared memory use on GPUs

Wenjing Ma, Gagan Agrawal

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 553 - 554

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly...

chapter

Data-Oriented Runtime Scheduling Framework on Multi-GPUs

Tao Li, Kezhao Zhao, Qiankun Dong, Jiabing Ling, more

2016 IEEE Trustcom/BigDataSE/ISPA > 1311 - 1318

2016 IEEE Trustcom/BigDataSE/ISPA

GPU has been generally accepted as an efficient accelerator in the field of high performance computing (HPC). On some heterogeneous systems, multiple GPUs are installed on each computing node. To make things more complicated, these GPUs may even have different architectures. Therefore, it is a challenge to efficiently schedule tasks and data on heterogeneous system. In this paper, we present DoSFoG,...

chapter

Online variational Bayesian Support Vector Regression

Siqi Deng, Kan Gao, Changying Du, Wenjing Ma, more

2016 International Joint Conference on Neural Networks (IJCNN) > 3950 - 3957

2016 International Joint Conference on Neural Networks (IJCNN)

Traditional Support Vector Regression (SVR) solvers require user pre-specified penalty (regularization) parameter as input and typically model the training data with maximum a posterior (MAP) principle. The resultant point estimates can be affected seriously by inappropriate regularization, outliers and noise, especially when training online. In this paper, we address the aforementioned problems by...

chapter

HPSVM: Heterogeneous Parallel SVM with Factorization Based IPM Algorithm on CPU-GPU Cluster

Tao Li, Xuechen Liu, Qiankun Dong, Wenjing Ma, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 74 - 81

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Support vector machine (SVM) is a supervised method widely used in the statistical classification and regression analysis. SVM training can be solved via the interior point method (IPM) with the advantages of low storage, fast convergence and easy parallelization. However, it is still confronted with the challenges of training speed and memory use. In this paper, we propose a parallel primal-dual...

chapter

HPSVM: Heterogeneous Parallel SVM with Factorization Based IPM Algorithm on CPU-GPU Cluster

Tao Li, Xuechen Liu, Qiankun Dong, Wenjing Ma, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 74 - 81

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Support vector machine (SVM) is a supervised method widely used in the statistical classification and regression analysis. SVM training can be solved via the interior point method (IPM) with the advantages of low storage, fast convergence and easy parallelization. However, it is still confronted with the challenges of training speed and memory use. In this paper, we propose a parallel primal-dual...

chapter

GB-RC4: Effective brute force attacks on RC4 algorithm using GPU

Pei Xue, Tao Li, Han Dong, Chunbo Liu, more

2016 Seventh International Green and Sustainable Computing Conference (IGSC) > 1 - 6

2016 Seventh International Green and Sustainable Computing Conference (IGSC)

Encryption algorithms are applied to a variety of fields and the security of encryption algorithms depends heavily on the computational infeasibility of exhaustive key-space search. RC4 algorithm has an extensive application for stream encryption, however, the disadvantages of traditional RC4 serial algorithm are large computational quantity and slow computation speed, which means a great challenge...

chapter

Detect Similar Mobile Applications with Transfer Learning

Ning Bu, Lei Yu, Wenjing Ma, Changying Du, more

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) > 856 - 859

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity)

Recent years have witnessed the fast growth of the use of the mobile applications (a.k.a. "apps"). Detecting similar apps is a basic problem in the app ecosystem. It is not only beneficial to app search and recommender systems, but also helpful for people to discover new apps. State-of-the-art studies defined several app similarity functions by the metainformation of apps, such as descriptions...

chapter

Simple scene simulation for polarized hyperspectral imaging composed of canopy and wall

Junping Zhang, Beifen Wu, Wenjing Ma

2014 12th International Conference on Signal Processing (ICSP) > 170 - 174

2014 12th International Conference on Signal Processing (ICSP 2014)

The polarized hyperspectral remote sensing combines hyperspectral remote sensing with polarization. However, research on polarized hyperspectral remote sensing started late and the data is far from enough. This paper presents the simulation of a simple scene based on hyperspectral and polarized models, which consists of canopy and wall. When analyzing wall's secondary scattering effects to the surrounding...

chapter

Thermal-reforming of toluene over core-shell Ni/γ-Al₂O₃ catalysts

Wenjing Ma, Ling Han, Liangmiao Zhang, Wencong Lu

2013 International Conference on Materials for Renewable Energy and Environment > 2 > 492 - 495

2013 International Conference on Materials for Renewable Energy and Environment (ICMREE)

In the present study, core-shell Al₂O₃ was synthesized by hydrothermal method and employed as support for the preparation of Ni/Al₂O₃ catalysts via an impregnation method. The toluene thermal-reforming was investigated in a fluidized bed reactor using these core-shell Ni/Al₂O₃ catalysts. The catalysts were characterized with TEM, BET, XRD and H₂-TPR techniques. Compared with the catalysts supported...

chapter

GMProf: A low-overhead, fine-grained profiling approach for GPU programs

Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, more

2012 19th International Conference on High Performance Computing > 1 - 10

2012 19th International Conference on High Performance Computing (HiPC)

Driven by the cost-effectiveness and the power-efficiency, GPUs are being increasingly used to accelerate computations in many domains. However, developing highly efficient GPU implementations requires a lot of expertise and effort. Thus, tool support for tuning GPU programs is urgently needed, and more specifically, low-overhead mechanisms for collecting fine-grained runtime information are critically...

chapter

Water quality model parameters inversion based on improved stochastic optimization

Junping Zhang, Wenjing Ma, Jiaguo Qi

2012 IEEE International Geoscience and Remote Sensing Symposium > 2032 - 2035

IGARSS 2012 - 2012 IEEE International Geoscience and Remote Sensing Symposium

As inherent optical properties (IOPs) are directly related to the constituents in the water, the condition of water quality can be reflected by fundamental IOPs absorption and scattering coefficients. And these values can be derived by analytically inverting the remote sensing spectral reflectance. In this paper, the relations between the remote sensing reflectance and water quality information are...

chapter

Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications

Wenjing Ma, Sriram Krishnamoorthy, Gagan Agrawal

2011 International Conference on Parallel Architectures and Compilation Techniques > 181 - 182

2011 International Conference on Parallel Architectures and Compilation Techniques (PACT)

Auto-tuning has emerged as an important practical method for creating highly optimized code. However, the growing complexity of architectures and applications has resulted in a prohibitively large search space that preclude empirical auto-tuning. Here, we focus on the challenge to auto-tuning presented by applications that require auto-tuning of not just a small number of distinct kernels, but a large...

chapter

Approaches for parallelizing reductions on modern GPUs

Xin Huo, V T Ravi, Wenjing Ma, G Agrawal

2010 International Conference on High Performance Computing > 1 - 10

2010 International Conference on High Performance Computing (HiPC 2010)

GPU hardware and software has been evolving rapidly. CUDA versions 1.1 and higher started supporting atomic operations on device memory, and CUDA versions 1.2 and higher started supporting atomic operations on shared memory. This paper focuses on parallelizing applications involving reductions on GPUs. Prior to the availability of support for locking, these applications could only be parallelized...

chapter

An integer programming framework for optimizing shared memory use on GPUs

Wenjing Ma, G Agrawal

2010 International Conference on High Performance Computing > 1 - 10

2010 International Conference on High Performance Computing (HiPC 2010)

General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern...

chapter

Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware

V Ramanathan, Wenjing Ma, V T Ravi, Tantan Liu, more

2010 IEEE International Conference on Data Mining Workshops > 186 - 193

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Data mining Engines). FREERIDE is based upon the...

chapter

Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters

Wenjing Ma, Sriram Krishnamoorthy, Oreste Villay, Karol Kowalski

2010 IEEE International Conference on Cluster Computing > 207 - 216

2010 IEEE International Conference on Cluster Computing (CLUSTER 2010)

Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute...

chapter

A Light-Size AKA Mechanism for Optimal Distributed AAA authorization Architecture

Wenjing Ma, Mei Song

2010 IEEE 71st Vehicular Technology Conference > 1 - 6

2010 IEEE Vehicular Technology Conference (VTC 2010-Spring)

According to the different identities of end users and attributes and roles that they have been granted, this paper presents an optimal distributed AAA authorization architecture to assign them different network resources or services. To improve the performance, this paper thus gives a detailed analysis about its security issues and proposes a light-size key agreement mechanism, including three kinds...

INFONA - science communication portal

Search results for: Wenjing Ma

Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor

Localized Fault Recovery for Nested Fork-Join Programs

26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight

An integer programming framework for optimizing shared memory use on GPUs

Data-Oriented Runtime Scheduling Framework on Multi-GPUs

Online variational Bayesian Support Vector Regression

HPSVM: Heterogeneous Parallel SVM with Factorization Based IPM Algorithm on CPU-GPU Cluster

HPSVM: Heterogeneous Parallel SVM with Factorization Based IPM Algorithm on CPU-GPU Cluster

GB-RC4: Effective brute force attacks on RC4 algorithm using GPU

Detect Similar Mobile Applications with Transfer Learning

Simple scene simulation for polarized hyperspectral imaging composed of canopy and wall

Thermal-reforming of toluene over core-shell Ni/γ-Al₂O₃ catalysts

GMProf: A low-overhead, fine-grained profiling approach for GPU programs

Water quality model parameters inversion based on improved stochastic optimization

Parameterized Micro-benchmarking: An Auto-tuning Approach for Complex Applications

Approaches for parallelizing reductions on modern GPUs

An integer programming framework for optimizing shared memory use on GPUs

Parallelizing an Information Theoretic Co-clustering Algorithm Using a Cloud Middleware

Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters

A Light-Size AKA Mechanism for Optimal Distributed AAA authorization Architecture

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: Wenjing Ma

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options