Fault-tolerant systems
There are many applications in which the reliability of the overall system must be far higher than the reliability of its individual components. In such cases, designers devise mechanisms and architectures that allow the system to either completely mask the effects of a component failure or recover...
Autor principal: | |
---|---|
Autor Corporativo: | |
Otros Autores: | |
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Amsterdam ; Boston :
Elsevier/Morgan Kaufmann
c2007.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009627157806719 |
Tabla de Contenidos:
- Foreword; Preface; Acknowledgements; About the Authors; 1 Preliminaries; 1.1 Fault Classification; 1.2 Types of Redundancy; 1.3 Basic Measures of Fault Tolerance; 1.3.1 Traditional Measures; 1.3.2 Network Measures; 1.4 Outline of This Book; 1.5 Further Reading; References; 2 Hardware Fault Tolerance; 2.1 The Rate of Hardware Failures; 2.2 Failure Rate, Reliability, and Mean Time to Failure; 2.3 Canonical and Resilient Structures; 2.3.1 Series and Parallel Systems; 2.3.2 Non-Series/Parallel Systems; 2.3.3 M-of-N Systems; 2.3.4 Voters; 2.3.5 Variations on N-Modular Redundancy
- 2.3.6 Duplex Systems2.4 Other Reliability Evaluation Techniques; 2.4.1 Poisson Processes; 2.4.2 Markov Models; 2.5 Fault-Tolerance Processor-Level Techniques; 2.5.1 Watchdog Processor; 2.5.2 Simultaneous Multithreading for Fault Tolerance; 2.6 Byzantine Failures; 2.6.1 Byzantine Agreement with Message Authentication; 2.7 Further Reading; 2.8 Exercises; References; 3 Information Redundancy; 3.1 Coding; 3.1.1 Parity Codes; 3.1.2 Checksum; 3.1.3 M-of-N Codes; 3.1.4 Berger Code; 3.1.5 Cyclic Codes; 3.1.6 Arithmetic Codes; 3.2 Resilient Disk Systems; 3.2.1 RAID Level 1; 3.2.2 RAID Level 2
- 3.2.3 RAID Level 33.2.4 RAID Level 4; 3.2.5 RAID Level 5; 3.2.6 Modeling Correlated Failures; 3.3 Data Replication; 3.3.1 Voting: Non-Hierarchical Organization; 3.3.2 Voting: Hierarchical Organization; 3.3.3 Primary-Backup Approach; 3.4 Algorithm-Based Fault Tolerance; 3.5 Further Reading; 3.6 Exercises; References; 4 Fault-Tolerant Networks; 4.1 Measures of Resilience; 4.1.1 Graph-Theoretical Measures; 4.1.2 Computer Networks Measures; 4.2 Common Network Topologies and Their Resilience; 4.2.1 Multistage and Extra-Stage Networks; 4.2.2 Crossbar Networks
- 4.2.3 Rectangular Mesh and Interstitial Mesh4.2.4 Hypercube Network; 4.2.5 Cube-Connected Cycles Networks; 4.2.6 Loop Networks; 4.2.7 Ad Hoc Point-to-Point Networks; 4.3 Fault-Tolerant Routing; 4.3.1 Hypercube Fault-Tolerant Routing; 4.3.2 Origin-Based Routing in the Mesh; 4.4 Further Reading; 4.5 Exercises; References; 5 Software Fault Tolerance; 5.1 Acceptance Tests; 5.2 Single-Version Fault Tolerance; 5.2.1 Wrappers; 5.2.2 Software Rejuvenation; 5.2.3 Data Diversity; 5.2.4 Software Implemented Hardware Fault Tolerance (SIHFT); 5.3 N-Version Programming; 5.3.1 Consistent Comparison Problem
- 5.3.2 Version Independence5.4 Recovery Block Approach; 5.4.1 Basic Principles; 5.4.2 Success Probability Calculation; 5.4.3 Distributed Recovery Blocks; 5.5 Preconditions, Postconditions, and Assertions; 5.6 Exception-Handling; 5.6.1 Requirements from Exception-Handlers; 5.6.2 Basics of Exceptions and Exception-Handling; 5.6.3 Language Support; 5.7 Software Reliability Models; 5.7.1 Jelinski-Moranda Model; 5.7.2 Littlewood-Verrall Model; 5.7.3 Musa-Okumoto Model; 5.7.4 Model Selection and Parameter Estimation; 5.8 Fault-Tolerant Remote Procedure Calls; 5.8.1 Primary-Backup Approach
- 5.8.2 The Circus Approach