Loading...

Fault Tolerance for Scalable Applications

Checkpointing Protocols for Parallel Message-Passing-Systems

by Bernd Bieker (Author)
©2003 Monographs 216 Pages

Summary

The usage of parallel or distributed systems offers the possibility to execute «grand challenge» problems. Due to the complexity of such high performance computing systems and the long execution times of todays simulations, the probability of a failure during a program run cannot be neglected. In this work fault tolerance – specificaly user-transparent checkpointing – is considered. Analysis is performed using simulations. Real implementations are deployed to verify results. The aim is to give an easy approximation on the overhead generated by checkpointing protocols. In addition, it is shown in which situations more complex checkpointing protocols are useful in contrast to very simple approaches.

Details

Pages
216
Year
2003
ISBN (Softcover)
9783899759006
Language
English
Keywords
Fehlertoleranz Parallelrechner Informatik Mehrrechnersystem Hochleistungsrechnen Skalierbarkeit Fixpunkt
Published
München, 2003. 216 pp.

Biographical notes

Bernd Bieker (Author)

Bernd Bieker has studied Electrical Engineering at the University of Illinois at Urbana-Champaign and at the Universität-GH Paderborn where he earned his Master degree. There and later at the Universität zu Lübeck he did research on fault tolerance. Since then he worked for IT companies in the banking, telecommunication and utility area.

Previous

Title: Fault Tolerance for Scalable Applications