Search results

Items from 81 to 100 out of 838 results

chapter

Evaluating and Optimizing the NERSC Workload on Knights Landing

Taylor Barnes, Brandon Cook, Jack Deslippe, Douglas Doerfler, more

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) > 43 - 53

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

NERSC has partnered with 20 representative application teams to evaluate performance on the Xeon-Phi Knights Landing architecture and develop an application-optimization strategy for the greater NERSC workload on the recently installed Cori system. In this article, we present early case studies and summarized results from a subset of the 20 applications highlighting the impact of important architecture...

chapter

Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture

Timothy Dysart, Peter Kogge, Martin Deneroff, Eric Bovell, more

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3) > 2 - 9

2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3)

There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This is due, in part, to the irregular memory access patterns demonstrated by these applications, and in how remote memory accesses are handled. This paper introduces a new, highly-scalable PGAS memory-centric system architecture...

chapter

Performance Comparison of CGRA and Mobile GPU for Light-Field Image Processing

Yuttakon Yuttakonkit, Yasuhiko Nakashima

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 174 - 180

2016 Fourth International Symposium on Computing and Networking (CANDAR)

Recently, many approaches apply light-field image processing on smartphones and wearable devices. A Graphic Processing Unit (GPU) is commonly used to exploit parallelism in such image processing. However, because the access pattern in the light-field application is more sparse than typical stencil applications and does not use all data in a cache line. Furthermore, the data requests to multiple locations...

chapter

RNS-Based Data Representation for Handling Multiple-Precision Integers on Parallel Architectures

Konstantin Isupov, Vladimir Knyazkov

2016 International Conference on Engineering and Telecommunication (EnT) > 76 - 79

2016 International Conference on Engineering and Telecommunication (EnT)

In most computer programs and general-purpose computing environments, the precision of any calculation is limited by the word size of the computer. However, for some applications, such as cryptography, this precision is not sufficient. In these cases, it is necessary to use multiple-precision numbers. Operations on such numbers in most computer software are implemented by third party libraries that...

chapter

IACM: Integrated adaptive cache management for high-performance and energy-efficient GPGPU computing

Kyu Yeun Kim, Jinsu Park, Woongki Baek

2016 IEEE 34th International Conference on Computer Design (ICCD) > 380 - 383

2016 IEEE 34th International Conference on Computer Design (ICCD)

Hardware caches are widely employed in GPGPUs to achieve higher performance and energy efficiency. Incorporating hardware caches in GPGPUs, however, does not immediately guarantee enhanced performance and energy efficiency due to high cache contention and thrashing. To address the inefficiency of GPGPU caches, various adaptive techniques (e.g., warp limiting) have been proposed. However, relatively...

chapter

GPU based implementation of spatial fuzzy c-means algorithm for image segmentation

N. Aitali, B. Cherradi, A. El Abbassi, O. Bouattane, more

2016 4th IEEE International Colloquium on Information Science and Technology (CiSt) > 460 - 464

2016 4th IEEE International Colloquium on Information Science and Technology (CIST)

In this paper a meaningful parallel implementation of spatial fuzzy c-means (SFCM) is presented. It has an advantage of being a powerful tool of classical fuzzy c-means. The great effort made to come up with this work is to reduce significantly its complexity and time execution simultaneously. This technique is inspired by the technological progress of GPUs hardware. The studies we have conducted...

chapter

A high-flexibility and energy-efficient application-specific cryptography VLIW processor for symmetric cipher algorithms

Wei Li, Xiaoyang Zeng, Longmei Nan, Tao Chen, more

2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT) > 1281 - 1284

2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)

In this paper, a high-flexibility and energy-efficient reconfigurable symmetric cryptographic processor architecture is presented, which is based on very-long instruction word (VLIW) structure. By analyzing basic operations and storage characteristics of symmetric ciphers, the application-specific instruction-set system for symmetric ciphers is proposed. Eleven kinds of reconfigurable cryptographic...

chapter

Dynamic Flow Rules in Software Defined Networks

Qing Wei, David Perez-Caparros, Artur Hecker

2016 Fifth European Workshop on Software-Defined Networks (EWSDN) > 25 - 30

2016 Fifth European Workshop on Software-Defined Networks (EWSDN)

Software Defined Networking (SDN) architecture enables centralized control of the forwarding behavior of individual network elements. While SDN brings many well-known benefits, such as manageability and adaptability, it also poses some challenges. Scalability becomes an issue in highly dynamic, large scale networks, where the forwarding rules of single elements must be updated at a high pace by a...

chapter

Extracting behaviour from an executable instruction set model

Brian Campbell, Ian Stark

2016 Formal Methods in Computer-Aided Design (FMCAD) > 33 - 40

2016 Formal Methods in Computer-Aided Design (FMCAD)

Presenting large formal instruction set models as executable functions makes them accessible to engineers and useful for less formal purposes such as simulation. However, it is more difficult to extract information about the behaviour of individual instructions for reasoning. We present a method which combines symbolic evaluation and symbolic execution techniques to provide a rule-based view of instruction...

chapter

Outline of a Thick Control Flow Architecture

Martti Forsell, Jussi Roivainen, Ville Leppanen

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 1 - 6

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

The recently invented thick control flow (TCF) model packs together an unbounded number of fibers, thread-like computational entities, flowing through the same control path. This promises to simplify parallel programming by partially eliminating looping and artificial thread arithmetics. In this paper we outline an architecture for efficiently executing programs written for the TCF model. It features...

chapter

A Benchmark on Multi Improvement Neighborhood Search Strategies in CPU/GPU Systems

Eyder Rios, Igor M. Coelho, Luiz Satoru Ochi, Cristina Boeres, more

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 49 - 54

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

In combinatorial optimization problems, the neighborhood search (NS) is a fundamental component for local search based heuristics. It consists of selecting a solution from a high cardinality set of neighbor solutions, by means of operations called moves. To perform this search, NS algorithms usually adopt two main approaches: selecting the first or best improving move. The Multi Improvement (MI) strategy...

chapter

A Processor Workload Distribution Algorithm for Massively Parallel Applications

Serge Midonnet, Achille Wattelar

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 25 - 30

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Directed Acyclic Graph (DAG) is a standard model used to describe tasks that execute according to precedence constraints and that allows intra-task parallelism. This model is well suited to camera-based applications where multiple treatments must be executed in parallel according to the camera input, such applications found for example in self-driving cars or image recognition via convolutional neural...

article

Providing Balanced Mapping for Multiple Applications in Many-Core Chip Multiprocessors

Di Zhu, Lizhong Chen, Siyu Yue, Timothy M. Pinkston, more

IEEE Transactions on Computers > 2016 > 65 > 10 > 3122 - 3135

This paper addresses the problem of balancing the on-chip packet latencies in a chip multi-processor (CMP), which is simultaneously executing multiple applications. Specifically, this paper presents a balanced application-to-core mapping algorithm that aims to minimize the maximum on-chip packet latency of all running applications. The paper starts by formulating the balanced mapping problem for CMPs...

chapter

Accelerating Multicore Architecture Simulation Using Application Profile

Keiji Kimura, Gakuho Taguchi, Hironori Kasahara

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 177 - 184

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Architecture simulators play an important role in exploring frontiers in the early stages of the architecture design. However, the execution time of simulators increases with an increase the number of cores. The sampling simulation technique that was originally proposed to simulate single-core processors is a promising approach to reduce simulation time. Two main hurdles for multi/many-core are preparing...

chapter

Modeling the Energy-Time Performance of MIC Architecture System

Lavanya Ramapantulu, Thy Dao, Dumitrel Loghin, Nam Thoai, more

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) > 85 - 94

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

Many Integrated Core (MIC) architecture systems are becoming increasingly popular for HPC applications as they have the dual-advantage of accelerating vector processing and a general-purpose programming model. One of the key challenges for energy-efficient execution on MIC architecture systems is to determine time and energy-efficient configurations among a large system configuration space. Given...

chapter

Towards parallel implementation of associative inference for cogent confabulation

Zhe Li, Qinru Qiu, Mangesh Tamhankar

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2016 IEEE High Performance Extreme Computing Conference (HPEC)

The superb efficiency and noise resilience of human cognizance comes from the extensive highly associative memory. For example, it is easy for human to recognize occluded or incomplete text images based on its context. Associative inference in the neocortex system is a concurrent process. Serial implementation of this concurrent process not only hinders its performance, but also limits the quality...

chapter

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 377 - 384

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

The performance of a CUDA kernel often depends on the number of threads per thread-block (thread-block size), and the optimal configuration differs according to the graphics processing unit (GPU) hardware and the given data size to the kernel. In particular, in linear algebra libraries such as Basic Linear Algebra Subprograms (BLAS), most routines support a wide range of problem sizes and various...

chapter

Identifying representative regions of parallel HPC applications: a cross-architectural evaluation

Alexandra Ferreron, Radhika Jagtap, Roxana Rusitoru

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 2

2016 IEEE International Symposium on Workload Characterization (IISWC)

As high performance computing (HPC) systems reach exascale proportions, the cost of simulation in time and resources increases. Tools for selecting representative parts of parallel applications to reduce simulation cost are widespread, e.g., BarrierPoint achieves this by analysing abstract characteristics such as basic blocks and reuse distances. However, architectures new to HPC will have a limited...

chapter

Multiple core PLC CPU with tight thread synchronization

Adam Milik, Miroslaw Chmiel, Edward Hrynkiewicz

2016 International Conference on Signals and Electronic Systems (ICSES) > 253 - 258

2016 International Conference on Signals and Electronic Systems (ICSES)

The paper presents the architecture of PLC CPU consisting of multiple cores enabling parallel processing of control algorithms. Control programs consist of many program fragments that are suitable for parallel execution. Proposed architecture is constructed from independent logic and arithmetic units. They share common data memories of respective types. In order to enable tight coupling of processing...

chapter

ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures

Jack Wadden, Vinh Dang, Nathan Brunelle, Tommy Tracy II, more

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 12

2016 IEEE International Symposium on Workload Characterization (IISWC)

High-performance automata-processing engines are traditionally evaluated using a limited set of regular expressionrulesets. While regular expression rulesets are valid real-world examples of use cases for automata processing, they represent a small proportion of all use cases for automata-based computing. With the recent availability of architectures and software frameworks for automata processing,...

Data set:
ieee
Keywords:
COMPUTER ARCHITECTURE
INSTRUCTION SETS

Publication date

Set your own date range

Content availability

Available (823)
None (15)

Publication type

book (730)
article (108)

Keywords

HARDWARE (206)
GRAPHICS PROCESSING UNITS (204)
REGISTERS (179)
KERNEL (174)
PARALLEL PROCESSING (124)
GRAPHICS PROCESSING UNIT (114)
COMPUTATIONAL MODELING (112)
GPU (107)
BENCHMARK TESTING (91)
CUDA (83)
MICROPROCESSOR CHIPS (82)
OPTIMIZATION (79)
SYNCHRONIZATION (68)
ALGORITHM DESIGN AND ANALYSIS (59)
FIELD PROGRAMMABLE GATE ARRAYS (58)
PIPELINES (58)
EMBEDDED SYSTEMS (54)
COPROCESSORS (50)
MULTIPROCESSING SYSTEMS (50)
PROGRAMMING (50)
GPGPU (48)
MICROPROCESSORS (44)
PROGRAM PROCESSORS (44)
PERFORMANCE EVALUATION (42)
PARALLEL ARCHITECTURES (40)
ACCELERATION (39)
CLOCKS (36)
MULTITHREADING (36)
RANDOM ACCESS MEMORY (35)
VLIW (35)
MESSAGE SYSTEMS (34)
THROUGHPUT (34)
RUNTIME (32)
SYSTEM-ON-CHIP (31)
BANDWIDTH (29)
MULTI-THREADING (29)
DECODING (28)
MATHEMATICAL MODEL (27)
PARALLEL PROGRAMMING (27)
SOFTWARE (27)
CACHE STORAGE (24)
FPGA (24)
LOGIC DESIGN (24)
PROTOCOLS (24)
CENTRAL PROCESSING UNIT (22)
COMPUTERS (22)
INSTRUCTION SET ARCHITECTURE (22)
PIPELINE PROCESSING (22)
PROCESSOR SCHEDULING (22)
PROGRAM COMPILERS (22)
SERVERS (22)
COMPUTER GRAPHIC EQUIPMENT (21)
CONTEXT (21)
ENCODING (21)
PARALLEL COMPUTING (21)
DATA MODELS (20)
DIGITAL SIGNAL PROCESSING (20)
VECTORS (20)
JAVA (19)
REAL TIME SYSTEMS (19)
EMBEDDED SYSTEM (18)
LIBRARIES (18)
MICROARCHITECTURE (18)
SCHEDULING (18)
SOFTWARE ARCHITECTURE (18)
HARDWARE-SOFTWARE CODESIGN (17)
REAL-TIME SYSTEMS (17)
RECONFIGURABLE ARCHITECTURES (17)
SPARSE MATRICES (17)
ANALYTICAL MODELS (16)
CRYPTOGRAPHY (16)
DYNAMIC SCHEDULING (16)
INSTRUCTION SET (16)
OPENMP (16)
PROCESS CONTROL (16)
TIMING (16)
APPLICATION SPECIFIC INTEGRATED CIRCUITS (15)
DATABASES (15)
DELAY (15)
HIGH PERFORMANCE COMPUTING (14)
INDEXES (14)
MONITORING (14)
PERFORMANCE (14)
REDUCED INSTRUCTION SET COMPUTING (14)
ENERGY CONSUMPTION (13)
POWER DEMAND (13)
SCALABILITY (13)
ASIP (12)
EQUATIONS (12)
INTEGRATED CIRCUIT DESIGN (12)
LINUX (12)
MULTICORE (12)
MULTICORE PROCESSING (12)
SPACE EXPLORATION (12)
ACCURACY (11)
COMPLEXITY THEORY (11)
DATA MINING (11)
DESIGN SPACE EXPLORATION (11)
more

INFONA - science communication portal

Search results

Evaluating and Optimizing the NERSC Workload on Knights Landing

Highly Scalable Near Memory Processing with Migrating Threads on the Emu System Architecture

Performance Comparison of CGRA and Mobile GPU for Light-Field Image Processing

RNS-Based Data Representation for Handling Multiple-Precision Integers on Parallel Architectures

IACM: Integrated adaptive cache management for high-performance and energy-efficient GPGPU computing

GPU based implementation of spatial fuzzy c-means algorithm for image segmentation

A high-flexibility and energy-efficient application-specific cryptography VLIW processor for symmetric cipher algorithms

Dynamic Flow Rules in Software Defined Networks

Extracting behaviour from an executable instruction set model

Outline of a Thick Control Flow Architecture

A Benchmark on Multi Improvement Neighborhood Search Strategies in CPU/GPU Systems

A Processor Workload Distribution Algorithm for Massively Parallel Applications

Providing Balanced Mapping for Multiple Applications in Many-Core Chip Multiprocessors

Accelerating Multicore Architecture Simulation Using Application Profile

Modeling the Energy-Time Performance of MIC Architecture System

Towards parallel implementation of associative inference for cogent confabulation

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Identifying representative regions of parallel HPC applications: a cross-architectural evaluation

Multiple core PLC CPU with tight thread synchronization

ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines and architectures

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options