The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Prefix Scan (or simply scan) is an operator that computes all the partial sums of a vector. A scan operation results in a vector where each element is the sum of the preceding elements in the original vector up to the corresponding position. Scan is a key operation in many relevant problems like sorting, lexical analysis, string comparison, image filtering among others. Although there are libraries...
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing...
Reading and writing data efficiently from storage system is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One well-known strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces...
This paper shows that Julia provides sufficient performance to bridge the performance gap between productivity-oriented languages and low-level languages for complex memory intensive computation tasks such as graph traversal. We provide performance guidelines for using complex low-level data structures in high productivity languages and present the first parallel integration on the productivity-oriented...
The Go language lacks built-in data structures that allow fine-grained concurrent access. In particular, its map data type, one of only two generic collections in Go, limits concurrency to the case where all operations are read-only; any mutation (insert, update, or remove) requires exclusive access to the entire map. The tight integration of this map into the Go language and runtime precludes its...
Despite various cloud technologies that have parallelized and scaled up big data analysis, they target data mostly in texts which are easy to partition and thus easy to map over a cluster system. Therefore, their parallelization do not necessarily cover scientific structured data such as NetCDF or need additional, user-provided tools to convert the original data into specific formats. To facilitate...
Processing In Memory (PIM), the concept of integrating processing directly with memory, has been attracting a lot of attention since PIM can assist in overcoming the throughput limitation caused by data movement between CPU and memory. The challenge, however, is that it requires the programmers to have a deep understanding of the PIM architecture to maximize the benefits such as data locality and...
The adoption of a programming language is positively influenced by the breadth of its software libraries. Chapel is a modern andrelatively young parallel programming language. Consequently, not many domain-specific software libraries exists that are written for Chapel. Graph processing is an important domain with many applications in cyber security, energy, social networking, and health. Implementing...
High-performance distributed memory applications often load or receive data in a format that differs from what the application uses. One such difference arises from how the application distributes data for parallel processing. Data must be redistributed from how it was laid out by the producer to how the application needs the data to be laid out amongst its processes. In this paper, we present a large-scale...
The central notion of this paper is that of contracts for concurrency, allowing one to capture the expected atomicity of sequences of method or service calls in a concurrent program. The contracts may be either extracted automatically from the source code, or provided by developers of libraries or software modules to reflect their expected usage in a concurrent setting. We start by extending the so-far...
dynStruct is an open source structure recovery tool for ×86 binaries. It uses dynamic binary instrumentation to record information about memory accesses, which is then processed off-line to recover structures created and used by the binary. It provides a powerful web interface which not only displays the raw data and the recovered structures but also allows this information to be explored and manually...
The most common errors in programs written in C/C++ languages are errors of memory interaction. This work is aimed at developing efficient software tools that provide more efficient detection of memory usage errors that occur during the execution of the program. For these purposes, memory analysis algorithms are discussed in the context of symbolic execution. Conducted experimental research has found...
Graph processing is used in many fields of science such as sociology, risk prediction or biology. Although analysis of graphs is important it also poses numerous challenges especially for large graphs which have to be processed on multicore systems. In this paper, we present PGAS (Partitioned Global Address Space) version of the level-synchronous BFS (Breadth First Search) algorithm and its implementation...
We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features such as global-view data structures, efficient support for the owner-computes model, flexible multidimensional data distribution schemes and interoperability with...
Python is a high level language that is used by scientists for numeric computations. However, the performance of the language can be a hindrance when scaling to larger data sets, requiring some operations to be rewritten in a lower level language. To address this problem, we propose two libraries to allow numeric Python code to be optimized incrementally, requiring minimal changes. Here we describe...
We present further work on SciSpark, a Big Data framework that extends Apache Spark's inmemory parallel computing to scale scientific computations. SciSpark's current architecture and design includes: time and space partitioning of highresolution geo-grids from NetCDF3/4; a sciDataset class providing N-dimensional array operations in Scala/Java and CF-style variable attributes (an update of our prior...
DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library without the need for a custom PGAS (pre-)compiler. We present the DASH NArray concept, a multidimensional array abstraction designed as an underlying container for stencil-and dense numerical applications. After introducing fundamental programming concepts used in DASH, we explain how these...
Supercomputing platforms are expected to have larger failure rates in the future because of scaling and power concerns. The memory and performance impact may vary with error types and failure modes. Therefore, localized recovery schemes will be important for scientific computations, including failure modes where application intervention is suitable for recovery. We present a resiliency methodology...
The DOE Extreme-Scale Technology Acceleration Fast Forward Storage and IO Stack project is going to have significant impact on storage systems design within and beyond the HPC community. With phase two of the project starting, it is an excellent opportunity to explore the complete design and how it will address the needs of extreme scale platforms. This paper examines each layer of the proposed stack...
Applications over sparse matrices and graphs often rely on efficient matrix representations that exploit the nonzero structure of the sparse representation. In some cases, this structure varies within the matrix, e.g., some portions are more dense and others are very sparse. For such matrices, hybrid algorithms are commonly used in sparse linear algebra and graph libraries, which employ multiple representations...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.