The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the...
In this paper, we proposed a biased support vector machine (Biased-SVM) with self-constructed Universum (termed as U-BSVM) to solve the PU learning problem. We first treat the PU problem as an imbalanced binary classification problem by labeling all the unlabeled inputs as negative with noise, then inspired by the Universum-SVM (U-SVM), introduce the Universum data set which is constructed from the...
In this paper, we propose to apply the nonparallel support vector machine (NPSVM) for positive and unlabeled learning problem(PU learning problem) in which only a small positive examples and a large unlabeled examples can be used. Like Biased-SVM, NPSVM treats the unlabeled set as the negative set with noise, while NPSVM is modified so that, the first primal problem is constructed such that all the...
Memory accesses limit the performance of stream processors. The stream compiler exploits the reuse of records distributed on different ALU clusters by introducing inter-cluster communications, which decreases the program performance. The paper presents the Stream Transpose (ST) approach to exploit such reuse. The approach, by reorganizing the data, puts data that have been distributed on neighboring...
Recent experimental studies reveal that FinFET devices commercialized in recent years tend to suffer from moresevere NBTI degradation compared to planar transistors, necessitating effective techniques on processors built with FinFET for endurable operations. We propose to address this problem by exploiting the device heterogeneity and leveraging the slower NBTI aging rate manifested on the planar...
Code optimization improves program performance through program analysis and program transformation, which transforms the program in an equivalent form. The basis of optimization is data flow analysis and control flow analysis. The paper first analyzes the characterization of Mgrid and the kernel Resid routine, including architecture analysis, data flow analysis, and dependence analysis, which is the...
In this paper, we introduce an optimized deep learning architecture with flexible layer structures and fast matrix operation kernels on parallel computing platform (e.g. NVIDIA's GPU). Carefully designed layer-wise strategies are conducted to integrate different kinds of deep architectures into a uniform neural training-testing system. Our fast matrix operation kernels are implemented in deep architecture's...
In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores and the memory subsystem. In...
Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism for evaluating...
We present a comprehensive study on the performance and power consumption of a recent ATI GPU. By employing a rigorous statistical model to analyze execution behaviors of representative general-purpose GPU (GPGPU) applications, we conduct insightful investigations on the target GPU architecture. Our results demonstrate that the GPU execution throughput and the power dissipation are dependent on different...
The support vector machine is a powerful supervised learning algorithm that has been successfully applied to a plenty of fields including text and image recognition, medical diagnosis and so on. The kernel and its parameters optimization, formally known as model selection, is a crucial factor which influences a good tradeoff between bias and variance. To automate model selection of support vector...
Radial basis function network (RBFN) which is commonly used in the classification problems has two parameters, a kernel center and a radius that can be determined by unsupervised or supervised learning. However, it has a disadvantage that it considers that all the independent variables have the equal weights. Thus the contour lines of the kernel function are circular, but in fact, the influence of...
In apple harvesting robot stereo vision system, fruit recognition based on least squares support vector machine (LS-SVM) and calibration based on binocular vision are proposed, in order to gain the location information of apples including depth. Firstly, vector median filtering, opening and closing operations are employed, then feature vectors, H and S components in HIS color model and shape features,...
Graphic Processing Unit (GPU), with many light-weight data-parallel cores, can provide substantial parallel computational power to accelerate general purpose applications. But the powerful computing capacity could not be fully utilized for memory-intensive applications, which are limited by off-chip memory bandwidth and latency. Stencil computation has abundant parallelism and low computational intensity...
In the robot vision system of the apple harvesting robot, the key is to recognize and locate the apple. To solve recognition questions such as high error rate, too much calculation and time consuming, a new recognizing method, support vector machine (SVM) is applied to improve recognition accuracy and efficiency. At first, vector median filter is used to remove the color images noise of apple fruit...
The discrete formal FRFT is difficult to obtained by the directly sampling the continuous FRFT because the kernel function of the continuous fractional Fourier transform (FRFT) exhibits drastic oscillation and the oscillation amplitude has the distinct difference from the different order of the FRFT. Discrete FRFT has been intensively investigated recently and many definitions of the discrete FRFT...
To improve the learning and generalization ability of the machine-learning model, a new compound kernel that may pay attention to the similar degree between sample space and feature space is proposed. In this paper, used the new compound kernel support vector machine to a speech recognition system for Chinese isolated words, non-specific person and middle glossary quantity, and compared the speech...
Stream processors, with the stream programming model, have demonstrated significant performance advantages in the domains signal processing, multimedia and graphics applications. In this paper we examine the applicability of a stream processor to 2-D Jacobi iteration which is widely used to solve partial differential equations, an important class of scientific programs. We first map 2-D Jacobi iteration...
Isomap is one of widely-used low-dimensional embedding methods. However, in many scenarios, the data come sequentially and the effect of the data is accumulated. Isomap algorithms have no the ability of new data be added for all data need to be available when estimates the geodesic distances. In this paper we propose an incremental learning isomap algorithm, which take the approximate geodesic distance...
The efficiency of scientific applications on the Imagine stream processor is increasingly concerned by researchers. One of the obstacles is that the programming language of Imagine does not target the scientific computing. This paper introduces a program transformation algorithm to automatically transform loops to the stream programs executed on Imagine. The optimization for memory accessing is also...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.