Search results for: Ana Gainaru

Items from 1 to 8 out of 8 results

chapter

Reducing Waste in Extreme Scale Systems through Introspective Analysis

Leonardo Bautista-Gomez, Ana Gainaru, Swann Perarnau, Devesh Tiwari, more

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 212 - 221

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Resilience is an important challenge for extreme-scale supercomputers. Today, failures in supercomputers are assumed to be uniformly distributed in time. However, recent studies show that failures in high-performance computing systems are partially correlated in time, generating periods of higher failure density. Our study of the failure logs of multiple supercomputers show that periods of higher...

chapter

Scheduling the I/O of HPC Applications Under Congestion

Ana Gainaru, Guillaume Aupy, Anne Benoit, Franck Cappello, more

2015 IEEE International Parallel and Distributed Processing Symposium > 1013 - 1022

2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

A significant percentage of the computing capacity of large-scale platforms is wasted because of interferences incurred by multiple applications that access a shared parallel file system concurrently. One solution to handling I/O bursts enlarge-scale HPC systems is to absorb them at an intermediate storage layer consisting of burst buffers. However, our analysis of the Argonne's Mira system shows...

chapter

Improving the Computing Efficiency of HPC Systems Using a Combination of Proactive and Preventive Checkpointing

Mohamed Slim Bouguerra, Ana Gainaru, Leonardo Bautista Gomez, Franck Cappello, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 501 - 512

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

As the failure frequency is increasing with the components count in modern and future supercomputers, resilience is becoming critical for extreme scale systems. The association of failure prediction with proactive check pointing seeks to reduce the effect of failures in the execution time of parallel applications. Unfortunately, proactive check pointing does not systematically avoid restarting from...

chapter

Fault prediction under the microscope: A closer look into HPC systems

Ana Gainaru, Franck Cappello, Marc Snir, William Kramer

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

A large percentage of computing capacity in today's large high-performance computing systems is wasted because of failures. Consequently current research is focusing on providing fault tolerance strategies that aim to minimize fault's effects on applications. By far the most popular technique is the checkpointrestart strategy. A complement to this classical approach is failure avoidance, by which...

chapter

Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems

Ana Gainaru, Franck Cappello, William Kramer

2012 IEEE 26th International Parallel and Distributed Processing Symposium > 1168 - 1179

2012 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

HPC systems are complex machines that generate a huge volume of system state data called âeventsâ. Events are generated without following a general consistent rule and different hardware and software components of such systems have different failure rates. Distinguishing between normal system behaviour and faulty situation relies on event analysis. Being able to detect quickly...

chapter

Framework for Mapping Data Mining Applications on GPUs

Ana Gainaru, Emil Slusanschi

2011 10th International Symposium on Parallel and Distributed Computing > 71 - 78

2011 10th International Symposium on Parallel and Distributed Computing (ISPDC)

Data mining algorithms are expensive by nature, but when dealing with today's dataset sizes, they are becoming even more slow and hard to use. Previous work has focused on parallelizing data mining algorithms on different architectures, and more recently, applications are starting to take advantage of the massive computation power and high bandwidth offered by GPUs. However there has been almost no...

chapter

Modeling and tolerating heterogeneous failures in large parallel systems

Eric Heien, Dan LaPine, Derrick Kondo, Bill Kramer, more

2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) > 1 - 11

2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

As supercomputers and clusters increase in size and complexity, system failures are inevitable. Different hardware components (such as memory, disk, or network) of such systems can have different failure rates. Prior works assume failures equally affect an application, whereas our goal is to provide failure models for applications that reflect their specific component usage. This is challenging because...

chapter

Toolkit for automatic analysis of chat conversations

Ana Gainaru, Stefan Daniel Dumitrescu, Stefan Trausan-Matu

2010 8th International Conference on Communications > 99 - 102

2010 8th International Conference on Communications (COMM)

Text-based tools, such as instant messaging, forums or blogs enable online collaboration in communities all over the world. Data sources that are more conversational in nature tend to have a typical, time-driven structure that existing text analyzing tools don't take into consideration. In this paper we propose a toolkit for information extraction from chat sessions. The system is optimized for conversational...

Filter options

Data set:
ieee

Publication date

Set your own date range

Keywords

FAULT TOLERANCE (5)
CORRELATION (4)
DATA MINING (4)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
CHECKPOINTING (2)
COMPUTATIONAL MODELING (2)
FAULT DETECTION (2)
FAULT TOLERANT SYSTEMS (2)
LARGE-SCALE HPC SYSTEMS (2)
PREDICTION ALGORITHMS (2)
PREDICTIVE MODELS (2)
RESILIENCE (2)
SIGNAL ANALYSIS (2)
AUTOMATIC ANALYSIS (1)
BLOGS (1)
BURST BUFFERS (1)
CHAT ANALYSIS (1)
CHAT CONVERSATIONS (1)
CHAT SESSIONS (1)
CLUSTERING ALGORITHMS (1)
COLLABORATIVE WORK (1)
COMPUTER ARCHITECTURE (1)
CONVERSATIONAL STRUCTURES (1)
DATA MINING APPLICATIONS (1)
DATA SOURCES (1)
FAILURE PREDICTION (1)
GPU (1)
GRAPHICS PROCESSING UNIT (1)
HARDWARE (1)
HPC APPLICATION PERFORMANCE (1)
I/O CONGESTION (1)
I/O SCHEDULER (1)
INFORMATION ANALYSIS (1)
INFORMATION EXTRACTION (1)
INTERFERENCE (1)
INTROSPECTIVE SYSTEMS (1)
ITEMSETS (1)
KNOWLEDGE MANIPULATION (1)
LARGE SCALE HPC SYSTEMS (1)
LARGE-SCALE SYSTEMS (1)
MATHEMATICAL MODEL (1)
MULTILEVEL CHECKPOINTING (1)
NATURAL LANGUAGE PROCESSING (1)
ONLINE COLLABORATION (1)
ONLINE COMMUNITIES/TECHNICAL COLLABORATION (1)
OPTIMIZATION (1)
PARALLELIZATION (1)
PROGRAM PROCESSORS (1)
RANDOM ACCESS MEMORY (1)
SEMANTIC CLOSENESS (1)
SILENT DATA CORRUPTION (1)
SOFT ERRORS (1)
SUPERCOMPUTERS (1)
TEXT ANALYSIS (1)
TEXT PROCESSING (1)
TEXT-BASED TOOLS (1)
more

INFONA - science communication portal

Search results for: Ana Gainaru

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options