The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The matrix-matrix multiplication is an essential building block that can be found in various scientific and engineering applications. High-performance implementations of the matrix-matrix multiplication on state-of-the-art processors may be of great importance for both the vendors and the users. In this paper, we present a detailed methodology of implementing and optimizing the double-precision general...
Nested fork-join programs scheduled using work stealing can automatically balance load and adapt to changes in the execution environment. In this paper, we design an approach to efficiently recover from faults encountered by these programs. Specifically, we focus on localized recovery of the task space in the presence of fail-stop failures. We present an approach to efficiently track, under work stealing,...
Stencil computation arises from a broad set of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to opti- mize stencil computation kernels on modern supercomputers with relatively high computing throughput whilst relatively low data-moving capability. This work serves as...
General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern NVIDIA GPUs have a very small programmable cache, referred to as shared memory, accesses to which are nearly...
GPU has been generally accepted as an efficient accelerator in the field of high performance computing (HPC). On some heterogeneous systems, multiple GPUs are installed on each computing node. To make things more complicated, these GPUs may even have different architectures. Therefore, it is a challenge to efficiently schedule tasks and data on heterogeneous system. In this paper, we present DoSFoG,...
Traditional Support Vector Regression (SVR) solvers require user pre-specified penalty (regularization) parameter as input and typically model the training data with maximum a posterior (MAP) principle. The resultant point estimates can be affected seriously by inappropriate regularization, outliers and noise, especially when training online. In this paper, we address the aforementioned problems by...
Support vector machine (SVM) is a supervised method widely used in the statistical classification and regression analysis. SVM training can be solved via the interior point method (IPM) with the advantages of low storage, fast convergence and easy parallelization. However, it is still confronted with the challenges of training speed and memory use. In this paper, we propose a parallel primal-dual...
Support vector machine (SVM) is a supervised method widely used in the statistical classification and regression analysis. SVM training can be solved via the interior point method (IPM) with the advantages of low storage, fast convergence and easy parallelization. However, it is still confronted with the challenges of training speed and memory use. In this paper, we propose a parallel primal-dual...
Encryption algorithms are applied to a variety of fields and the security of encryption algorithms depends heavily on the computational infeasibility of exhaustive key-space search. RC4 algorithm has an extensive application for stream encryption, however, the disadvantages of traditional RC4 serial algorithm are large computational quantity and slow computation speed, which means a great challenge...
Recent years have witnessed the fast growth of the use of the mobile applications (a.k.a. "apps"). Detecting similar apps is a basic problem in the app ecosystem. It is not only beneficial to app search and recommender systems, but also helpful for people to discover new apps. State-of-the-art studies defined several app similarity functions by the metainformation of apps, such as descriptions...
The polarized hyperspectral remote sensing combines hyperspectral remote sensing with polarization. However, research on polarized hyperspectral remote sensing started late and the data is far from enough. This paper presents the simulation of a simple scene based on hyperspectral and polarized models, which consists of canopy and wall. When analyzing wall's secondary scattering effects to the surrounding...
In the present study, core-shell Al2O3 was synthesized by hydrothermal method and employed as support for the preparation of Ni/Al2O3 catalysts via an impregnation method. The toluene thermal-reforming was investigated in a fluidized bed reactor using these core-shell Ni/Al2O3 catalysts. The catalysts were characterized with TEM, BET, XRD and H2-TPR techniques. Compared with the catalysts supported...
Driven by the cost-effectiveness and the power-efficiency, GPUs are being increasingly used to accelerate computations in many domains. However, developing highly efficient GPU implementations requires a lot of expertise and effort. Thus, tool support for tuning GPU programs is urgently needed, and more specifically, low-overhead mechanisms for collecting fine-grained runtime information are critically...
As inherent optical properties (IOPs) are directly related to the constituents in the water, the condition of water quality can be reflected by fundamental IOPs absorption and scattering coefficients. And these values can be derived by analytically inverting the remote sensing spectral reflectance. In this paper, the relations between the remote sensing reflectance and water quality information are...
Auto-tuning has emerged as an important practical method for creating highly optimized code. However, the growing complexity of architectures and applications has resulted in a prohibitively large search space that preclude empirical auto-tuning. Here, we focus on the challenge to auto-tuning presented by applications that require auto-tuning of not just a small number of distinct kernels, but a large...
GPU hardware and software has been evolving rapidly. CUDA versions 1.1 and higher started supporting atomic operations on device memory, and CUDA versions 1.2 and higher started supporting atomic operations on shared memory. This paper focuses on parallelizing applications involving reductions on GPUs. Prior to the availability of support for locking, these applications could only be parallelized...
General purpose computing using GPUs is becoming increasingly popular, because of GPU's extremely favorable performance/price ratio. Besides application development using CUDA, automatic code generation for GPUs is also receiving attention. Like standard processors, GPUs also have a memory hierarchy, which must be carefully optimized for in order to achieve efficient execution. Specifically, modern...
The emerging cloud environments are well suited for storage and analysis of large datasets, since they can allow on-demand access to resources. However, developing high-performance implementations of data analysis tasks is a challenging problem. In our prior work, we have developed a middleware called FREERIDE (FRamework for Rapid Implementation of Data mining Engines). FREERIDE is based upon the...
Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on GPUs requires tackling several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. In this paper, we present our approach to automatically generate CUDA code to execute...
According to the different identities of end users and attributes and roles that they have been granted, this paper presents an optimal distributed AAA authorization architecture to assign them different network resources or services. To improve the performance, this paper thus gives a detailed analysis about its security issues and proposes a light-size key agreement mechanism, including three kinds...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.