Search results

Items from 1 to 20 out of 32 results

article

Toward General Software Level Silent Data Corruption Detection for Parallel Applications

Eduardo Berrocal, Leonardo Bautista-Gomez, Sheng Di, Zhiling Lan, more

IEEE Transactions on Parallel and Distributed Systems > 2017 > 28 > 12 > 3642 - 3655

Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extreme-scale systems. Mechanisms have been proposed that are able to detect SDC in HPC applications by using the peculiarities of the data (more specifically, its “smoothness” in time and space) to make predictions. However, these data-analytic solutions are still far from fully protecting...

chapter

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses

Thomas Rohl, Jan Eitzinger, Georg Hager, Gerhard Wellein

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 781 - 784

2017 IEEE International Conference on Cluster Computing (CLUSTER)

System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological...

chapter

YAViT (Yet Another Viz Tool): Raising the Level of Abstraction in End-User HPC Interactions

Omar Aaziz, Ujjwal Panthi, Jonathan Cook

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 814 - 817

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Because data collection in HPC systems happens on the nodes and is easily related to the job running on the node, tools presenting the data and subsequent analyses to the user generally present them at the job level. Our position is that this is the wrong level of abstraction and thus limits the value of the analyses, often dissuading users from using any of the offered tools. In this paper we present...

chapter

Static graph challenge: Subgraph isomorphism

Siddharth Samsi, Vijay Gadepally, Michael Hurley, Michael Jones, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The...

chapter

On the Solution of Fuzzy Volterra Integral Equation of Second Kind

Zahra Alijani, Urve Kangro

2017 International Conference on High Performance Computing & Simulation (HPCS) > 483 - 488

2017 International Conference on High Performance Computing & Simulation (HPCS)

In this study we use triangular basis function set to solve second kind fuzzy integral equation that can be converted to a system of two integral equations in crisp case. We also consider collocation method for approximately solving the equation.

chapter

An Autonomic Approach for the Selection of Robust Dynamic Loop Scheduling Techniques

Anthony Boulmier, Ioana Banicescu, Florina M. Ciorba, Nabil Abdennadher

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 9 - 17

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

Parallel applications are highly irregular and high performance computing (HPC) infrastructures are very complex. The HPC applications of interest herein are timestepping scientific applications (TSSA). Often, TSSA involve the repeated execution of multiple parallel loops with thousands of iterations and irregular behavior. Dynamic loop scheduling (DLS) techniques were developed over time and have...

chapter

Predicting Cloud Performance for HPC Applications: A User-Oriented Approach

Giovanni Mariani, Andreea Anghel, Rik Jongerius, Gero Dittmann

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 524 - 533

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Cloud computing enables end users to execute high-performance computing applications by renting the required computing power. This pay-for-use approach enables small enterprises and startups to run HPC-related businesses with a significant saving in capital investment and a short time to market. When deploying an application in the cloud, the users may a) fail to understand the interactions of the...

chapter

Failure detection through monitoring of the scientific distributed system

Khanasin Yamnual, Phond Phunchongharn, Tiranee Achalakul

2017 International Conference on Applied System Innovation (ICASI) > 568 - 571

2017 International Conference on Applied System Innovation (ICASI)

Performance monitoring is essential for all subsystems, especially high performance computing systems. These systems are sensitive to errors and failures which lead to data losses and then severely impact on the organizations. Consequently, resource information in the systems (e.g., CPU usage, memory usage, disk I/O usage, etc.) during the operations must be collected through the system monitoring...

chapter

Scheduling-Aware Routing for Supercomputers

Jens Domke, Torsten Hoefler

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis > 142 - 153

SC16: International Conference for High Performance Computing, Networking, Storage and Analysis

The interconnection network has a large influence on total cost, application performance, energy consumption, and overall system efficiency of a supercomputer. Unfortunately, today's routing algorithms do not utilize this important resource most efficiently. We first demonstrate this by defining the dark fiber metric as a measure of unused resource in networks. To improve the utilization, we propose...

chapter

Quantifying Energy Use in Dense Shared Memory HPC Node

Milos Puzovic, Srilatha Manne, Shay GalOn, Makoto Ono

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC) > 16 - 23

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)

In this paper we introduce a novel, dense, system-on-chip many-core Lenovo NeXtScale System® server based on the Cavium THUNDERX® ARMv8 processor that was designed for performance, energy efficiency and programmability. THUNDERX processor was designed to scale up to 96 cores in a cache coherent, shared memory architecture. Furthermore, this hardware system has a power interface board (PIB) that measures...

chapter

The Right Metric for Efficient Supercomputing: A Ten-Year Retrospective

Chung-Hsing Hsu, Wu-Chun Feng

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1090 - 1093

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

About ten years ago, we presented the results of an effort to identify the "right metric" for efficient supercomputing at this workshop, The Workshop on High-Performance, Power-Aware Computing. In this paper, we review the advances that the community has made in this area of research. The intention of this ten-year retrospective is two-fold: (1) to acknowledge the past work through a historical...

chapter

Suitability of the Random Topology for HPC Applications

Fabien Chaix, Ikki Fujiwara, Michihiro Koibuchi

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 301 - 304

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to their low diameter, low average shortest path length and high scalability. However, existing supercomputers still prefer torus and fat-tree topologies, because a number of existing parallel...

chapter

Suitability of the Random Topology for HPC Applications

Fabien Chaix, Ikki Fujiwara, Michihiro Koibuchi

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 301 - 304

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

chapter

A long-term user-centric analysis of deduplication patterns

Zhen Sun, Geoff Kuenning, Sonam Mandal, Philip Shilane, more

2016 32nd Symposium on Mass Storage Systems and Technologies (MSST) > 1 - 7

2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)

Deduplication has become essential in disk-based backup systems, but there have been few long-term studies of backup workloads. Most past studies either were of a small static snapshot or covered only a short period that was not representative of how a backup system evolves over time. For this paper, we collected 21 months of data from a shared user file system; 33 users and over 4,000 snapshots are...

chapter

Storage speedup: An effective metric for I/O-intensive parallel application

Wei Hu, Guangming Liu, Qiong Li, Xiaofeng Wang, more

2016 18th International Conference on Advanced Communication Technology (ICACT) > 368 - 372

2016 18th International Conference on Advanced Communication Technology (ICACT)

With supercomputer system scaling up, the performance gap between compute and storage system increases dramatically. The traditional speedup only measures the performance of compute system. In this paper, we firstly propose the speedup metric taking into account the I/O constraint. The new metric unifies the computing and I/O performance, and evaluates practical speedup of parallel application under...

chapter

Monitoring High Performance Computing Systems for the End User

Christopher Lee Moore, Prabhu Singh Khalsa, Todd Alan Yilk, Michael Mason

2015 IEEE International Conference on Cluster Computing > 714 - 716

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Monitoring High Performance Computing clusters is currently geared towards providing system administrators the information they need to make informed decisions on the resources used in the cluster. However, this emphasis leaves out the End User, those who utilize the cluster resources towards projects and programs, as they are not given the information of how their workflow is impacting the cluster...

chapter

Toward Rapid Understanding of Production HPC Applications and Systems

Anthony Agelastos, Benjamin Allan, Jim Brandt, Ann Gentile, more

2015 IEEE International Conference on Cluster Computing > 464 - 473

2015 IEEE International Conference on Cluster Computing (CLUSTER)

A detailed understanding of HPC application's resource needs and their complex interactions with each other and HPC platform resources is critical to achieving scalability and performance. Such understanding has been difficult to achieve because typical application profiling tools do not capture the behaviors of codes under the potentially wide spectrum of actual production conditions and because...

chapter

Push Me Pull You: Integrating Opposing Data Transport Modes for Efficient HPC Application Monitoring

Omar Aaziz, Jonathan Cook, Hadi Sharifi

2015 IEEE International Conference on Cluster Computing > 674 - 681

2015 IEEE International Conference on Cluster Computing (CLUSTER)

While HPC system monitoring is a necessary and accepted practice, applications are still basically opaque in the production environment. For better HPC platform management and utilization, especially as platforms push towards exascale size, HPC applications need to be more transparent in their execution in the production environment. PROMON is a framework for application monitoring in the production...

chapter

Quantifying the Effects of Contention on Parallel File Systems

Steven A. Wright, Stephen A. Jarvis

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 932 - 940

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

As we move towards the Exactable era of supercomputing, node-level failures are becoming more common-place, frequent check pointing is currently used to recover from such failures in long-running science applications. While compute performance has steadily improved year-on-year, parallel I/O performance has stalled, meaning check pointing is fast becoming a bottleneck to performance. Using current...

chapter

Security Evaluation for Cyber Situational Awareness

Igor Kotenko, Elena Doynikova

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 1197 - 1204

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

The paper considers techniques for measurement and calculation of security metrics taking into account attack graphs and service dependencies. The techniques are based on several assessment levels (topological, attack graph level, attacker level, events level and system level) and important aspects (zero-day attacks, cost-efficiency characteristics). It allows understanding the current security situation,...

Keywords:
MEASUREMENT

Publication date

Set your own date range

Publication type

book (30)
article (2)

Keywords

MONITORING (10)
SUPERCOMPUTERS (9)
BENCHMARK TESTING (7)
BANDWIDTH (6)
CLOUD COMPUTING (4)
COMPUTATIONAL MODELING (4)
ENERGY EFFICIENCY (4)
HARDWARE (4)
PROCESSOR SCHEDULING (4)
RESOURCE MANAGEMENT (4)
COMPUTER ARCHITECTURE (3)
DATABASES (3)
ELECTRONIC MAIL (3)
ENERGY CONSUMPTION (3)
KERNEL (3)
OPTIMIZATION (3)
ORGANIZATIONS (3)
PARALLEL PROCESSING (3)
RUNTIME (3)
SCHEDULING (3)
SERVERS (3)
COMPUTERS (2)
CONFERENCES (2)
DYNAMIC SCHEDULING (2)
FILE SYSTEMS (2)
GREEN PRODUCTS (2)
GRID COMPUTING (2)
HEURISTIC ALGORITHMS (2)
HIGH-PERFORMANCE COMPUTING (2)
INDEXES (2)
INTERCONNECTION NETWORKS (2)
LIBRARIES (2)
MEMORY MANAGEMENT (2)
NETWORK TOPOLOGY (2)
PERFORMANCE ANALYSIS (2)
PRODUCTION (2)
QUALITY OF SERVICE (2)
ROUTING (2)
SWITCHES (2)
THROUGHPUT (2)
TOOLS (2)
TOPOLOGY (2)
ACCELERATORS (1)
ACCESS PATTERN (1)
ACCESSIBILITY (1)
ACCURACY (1)
ALGORITHM DESIGN AND ANALYSIS (1)
AMQP (1)
APPLICATION MONITORING (1)
APPLICATION-DRIVEN REQUIREMENT (1)
ATTACK GRAPHS (1)
AUTOMATED APPLICATION TUNING (1)
AUTONOMIC COMPUTING (1)
AVAILABILITY (1)
BATCH SCHEDULING ALGORITHMS (1)
BUSINESS (1)
BUSINESS DATA PROCESSING (1)
CLASSIFICATION ALGORITHMS (1)
CLOUD (1)
CLOUDS (1)
CLUSTERING (1)
CLUSTERS (1)
COLLOCATION METHOD (1)
COMMUNITIES (1)
COMPUTER NETWORK MANAGEMENT (1)
COMPUTING (1)
COMPUTING ARCHITECTURES (1)
COMPUTING ON DEMAND (1)
CONTRACT DURATION (1)
CONTRACT LAW (1)
CONTRACTS (1)
COOLING (1)
COST (1)
COST EFFECTIVE SCHEDULING (1)
CYBER SITUATIONAL AWARENESS (1)
DATA ANALYSIS (1)
DATA COLLECTION (1)
DATA MODELS (1)
DATA STORAGE SYSTEMS (1)
DATA STRUCTURES (1)
DESIGN OF EXPERIMENTS (1)
DISTRIBUTED SYSTEMS (1)
DOCUMENTATION (1)
DYNAMIC FRACTIONAL RESOURCE SCHEDULING (1)
DYNAMIC LOOP SCHEDULING (1)
EDUCATIONAL INSTITUTIONS (1)
ELASTICSEARCH-LOGSTASH-KIBANA (1)
END USERS (1)
ENERGY-EFFICIENCY (1)
ENTERPRISE COMPUTING (1)
EQUATIONS (1)
FAILURE DETECTION (1)
FILE SERVERS (1)
FUZZY INTEGRAL EQUATION (1)
GAMES (1)
GRAPHICS PROCESSING UNIT (1)
GREEN IT (1)
GRID (1)
more

INFONA - science communication portal

Search results

Toward General Software Level Silent Data Corruption Detection for Parallel Applications

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses

YAViT (Yet Another Viz Tool): Raising the Level of Abstraction in End-User HPC Interactions

Static graph challenge: Subgraph isomorphism

On the Solution of Fuzzy Volterra Integral Equation of Second Kind

An Autonomic Approach for the Selection of Robust Dynamic Loop Scheduling Techniques

Predicting Cloud Performance for HPC Applications: A User-Oriented Approach

Failure detection through monitoring of the scientific distributed system

Scheduling-Aware Routing for Supercomputers

Quantifying Energy Use in Dense Shared Memory HPC Node

The Right Metric for Efficient Supercomputing: A Ten-Year Retrospective

Suitability of the Random Topology for HPC Applications

Suitability of the Random Topology for HPC Applications

A long-term user-centric analysis of deduplication patterns

Storage speedup: An effective metric for I/O-intensive parallel application

Monitoring High Performance Computing Systems for the End User

Toward Rapid Understanding of Production HPC Applications and Systems

Push Me Pull You: Integrating Opposing Data Transport Modes for Efficient HPC Application Monitoring

Quantifying the Effects of Contention on Parallel File Systems

Security Evaluation for Cyber Situational Awareness

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options