M1: Dependable systems and fault tolerance
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M1.jpg
Provide the introduction to system dependability and the relevant building aspects of dependable systems (availability, reliability, safety, maintainability, integrity). The lecture gives an overview into the main concepts of fault tolerance, being that physical, informational or temporal fault tolerance, as well as the software fault tolerance as a specific consideration.
For the semi-autonomous vehicle, participants will discuss each term’s meaning, provide examples of failures, and identify techniques for fault prevention, removal, forecasting, and various fault tolerance approaches.
2 hours 30 minutes
M2: Static redundancy
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M2.jpg
Provide the introduction to fault tolerance techniques regarding the physical system architectures and the addition of redundant components in a static (always working) configuration. The lecture recaps the important random failures metrics (failure probability, reliability, reliability modeling via RBDs, failure rates and e-systems calculation), and gives insights to most common static redundancy configuration and the calculus regarding the improvement of reliability of systems modeled in such a way.
For the given system determine the system’s reliability and recalculate reliability with additional static redundancy for components.
2 hours 30 minutes
M3: Dynamic redundancy
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M3.jpg
The lecture discusses the potential of reconfiguring the system in the presence of faults/errors, during the system runtime. Specific mathematical modeling for such configurations is laid out, including stochastic processes (Markov chains) and how they can be used to model any dynamic behavior of the system experiencing failures, such as by activating spare components or performing the system repair.
For the given system adjust the reliability considering the individual component failure rates.
2 hours 30 minutes
M4: Standby systems
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M4.jpg
Standby systems are specifically focusing on dynamic redundancy application with hot and cold spare components, and their specific implementation practices. The lecture gives the required calculus for modeling standby systems and discusses gains in terms of reliability.
Participants will analyze how fault tolerance improvements were applied in the selected system from the industry of choice, including the feasibility of implementation of static redundancy mechanisms and evaluating the effects to reliability.
2 hours 30 minutes
M5: Repairable systems
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M5.jpg
Specific repair process and key repair terminology and variables are introduced (maintenance probability, repair rate). Systems are modeled for repair, and it is shown how the repair process may achieve the required system availability. Concepts of availability and repair are closely correlated, providing means to calculate the availability and reliability of the system and establish their relation in the fault tolerance context.
Participants will evaluate the allowed average repair time and complete the calculations according to the given parameters.
2 hours 30 minutes
M6: Physical fault tolerance applications (Project 1)
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M6.jpg
Physical fault tolerance concepts (static and dynamic redundancy and repair) are now applied to a specific example of a system from the relevant industry (e.g. automotive).
Participants are required to apply various fault-tolerance techniques to increase the dependability of the vehicle domain controller. The project is worked out in groups.
10 hours
M7: Information fault tolerance and block codes
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M7.jpg
Introducing the basic information fault tolerance concepts which are used to tolerate errors that happen during the information exchange, e.g., in the communication channel. Channel coding concepts are introduced, together with binary block codes, which can be constructed to add redundant data to the data being sent to later detect or correct the errors resulting from the fault in the communication channel.
Participants will model the reliability of typical communication channels in electronic systems, and see how errors affect the overall system failure rate, then reassess by applying basic error-correction codes.
2 hours 30 minutes
M8: Linear block codes
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M8.jpg
Deepening the information fault tolerance concepts with a specific class of codes allowing interesting properties which facilitate encoding and decoding process, including generator and parity check matrices, as well as syndrome decoding.
Each group will construct either an encoder or decoder with error detection for a system, using Hamming codes to tolerate bit errors, and then test and verify their solution with provided data.
2 hours 30 minutes
M9: Polynomial codes
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M9.jpg
A specific class of codes are introduced including all the relevant mathematical foundations required for understanding this class of codes and their application for channel encoding and decoding.
Each group will construct either an encoder or decoder with error detection for a system, using polynomial codes to tolerate bit errors, and then test and verify their solution with provided data.
2 hours 30 minutes
M10: Cyclic codes and CRC
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M10.jpg
A specific class of polynomial codes, bring in properties that allow the straightforward construction of the code and easy decoding and encoding implementation in simple hardware accelerators. A specific class of systematic cyclic codes allows the calculation of CRC checksums which are widely applied to detect errors that happen due to communication, unreliable storage media, etc.
Participants will design a fault-tolerant system using a cyclic code, including generator polynomial selection, encoding/decoding processes, and LFSR simulation.
2 hours 30 minutes
M11: Information fault tolerance applications (Project 2)
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M11.jpg
Brings in a real industry example where the information fault tolerance is applied together with the physical fault tolerance concepts. Students are required to follow their own system example in an assignment, to estimate the required channel code, construct it and demonstrate the dependability gains for their system.
Participants will need to implement information fault tolerance concepts on a vehicle domain controller to improve overall system dependability.
10 hours
M12: Temporal fault tolerance
https://academy.nit-institute.com/wp-content/uploads/2024/08/FTS-M12.jpg
Lecture addresses the application of time-based methods, such as repeat and compare, automatic repeat request (ARQ), sliding window and alike, which are based on repeating the operation to combat transient faults which lead to erroneous operation outputs. The lecture also gives the recap of all fault tolerance methods given in the course, by stating out their summary applicability for all important fault tolerance phases (error detection and correction, damage assessment and confinement, error recovery and fault treatment).
none
1 hour
Final exam
none
none
none
1 hour 30 minutes