Risk exists in every industry and every profession. It’s understanding those risks and taking the right steps that help you thrive.
That’s what a fault tree analysis can help you with.
It’s one of the many ways to go about root cause analysis to prevent problems from happening or from reoccurring.
In this guide, you’ll learn what fault tree analysis is, how to do it, and much more.
Let’s dive in.
What Is a Fault Tree Analysis?
Fault Tree Analysis (FTA) is a deductive risk assessment technique that starts from the top down and is used to identify the root causes of system failures.
You start the analysis with the problematic event. This can range from damaged equipment, injuries, and lowered throughput. You break down the event into the potential factors that contributed to its creation using what’s known as logic gates.
Fault tree analysis makes it possible to understand how individual failures, which may be negligible alone, can work together to create major problems. This deeper understanding makes it easier to create effective preventive measures.
History and Development
FTA was first developed in 1962 by Bell Laboratories for the U.S. Air Force to analyze safety risks in the Minuteman missile system. From there, it became popular because of its clear structure. This led to it being widely used in high-risk operations.
Today, it’s a standardized methodology for risk assessment and mitigation. There are clear guidelines, such as MIL-STD-1629A and IEC 61025, that should be followed.
Key Benefits
Using FTA provides several advantages in risk management and system design. Here are a few important ones.
- Identify and prioritize risks by visually mapping failure pathways,
- Decision-making clarity. It makes the relationship between events clear. From there, you can take proactive preventive measures.
- Enhance regulatory compliance by providing a structured method for documenting risk assessments.
- Cost-effective risk mitigation by showing the more pressing problems so you can properly allocate resources to them.
Components of a Fault Tree Diagram

Top Event (System Failure or Undesired Event)
The Top Event is the main failure or problem you’re looking at. It resides at the top of the fault tree diagram and is the first step in the analysis.
It varies based on what you’re trying to accomplish. It could be a lack of efficiency, total system failure, or something else. It’s usually described in broad strokes like ‘brake system failure’.
Your goal in FTA is to identify all possible causes leading to this top event and determine how to mitigate them.
Intermediate Events (Contributing Failures)
Intermediate events are failures that contribute to the major event you’re analyzing. They’re not the root cause but are still important. Think of the intermediate events as the links between the major system failure and the root cause (known as the basic event).
For a brake system failure, the intermediate event could be leaking brake fluid. The thing that causes the brake fluid to leak is likely the root cause – not the leaking brake fluid.
Basic Events (Root Causes)
Basic events are the root cause of the top event and reside at the bottom of the fault tree. The reasons vary but may be from human error, major material defects, poor maintenance, etc.
If you can get rid of the basic event, it’s more likely that the top event won’t occur. For example, if the brake fluid is leaking because the pipe has a hole, changing the pipe should prevent brake system failure.
Logic Gates (Connecting Events and Establishing Relationships)
Logic gates define the relationships between different failure events in a fault tree. They determine how multiple failures interact to trigger a higher-level event. The two most commonly used logic gates in FTA are:
AND Gate: Multiple Failures Must Occur Together
An AND states that all conditions must be met for the top event to happen. If one is absent, then the major event does not occur. The and gate is used when failures are dependent on each other.
Example: In an electrical system failure, an AND gate might connect:
- “Main Power Failure”
- “Backup Generator Malfunction”
Here, both must fail at the same time for the system failure to occur.
OR Gate: Any One of the Failures Can Lead to the Higher-Level Event
An OR gate states that any of the contributing failures can cause the top event failure. This gate is used when multiple failure paths exist, each capable of triggering the top event on its own.
Example: In a server crash scenario, an OR gate might connect:
- “Overheating”
- “Software Bug”
- “Hard Drive Failure”
If any of these failures happen, the server crash occurs.
Transfer Symbols
Transfer symbols are used to simplify complex fault tree diagrams by linking related events across different sections. You can use them to break down larger fault trees into component parts. This makes management easier and clearer.
Transfer symbols typically indicate when a sub-tree is referenced elsewhere in the diagram, avoiding redundancy.
Example: If multiple subsystems in a factory rely on a “Main Power Supply”, you can use a transfer symbol to show that failure in one area is linked to another part of the system without redrawing the same logic repeatedly.
How to Perform a Fault Tree Analysis
1. Define the Top Event
The top event is where all other information flows from and is the focus of the fault tree. Again, the top event can vary widely and is up to you to determine. The main thing to consider is how you define and measure it.
If you are analyzing an industrial manufacturing plant, your top event might be “Unexpected Production Line Shutdown.”
2. Identify Contributing Factors
After you’ve defined the top event, list the intermediate events that contribute to the top event. Keep in mind that these aren’t the root causes but may stem from them. After you’ve listed them out, proceed to list out the basic events (also known as the root cause). You may need to do root cause analysis like 5 Whys and Fishbone Diagrams.
Example: For an unexpected production line shutdown, possible contributing factors might include:
- Power Supply Failure (Intermediate Event)
- Main power grid failure (Basic Event)
- Backup generator malfunction (Basic Event)
- Equipment Malfunction (Intermediate Event)
- Overheating motor (Basic Event)
- Conveyor belt misalignment (Basic Event)
3. Construct the Fault Tree Diagram
You have the top event defined and the contributing factors identified. The next step is to create the fault try diagram which you’ll use for deeper analysis. Use logic gates to determine how the failures work tougher to create the top event you’re investigating.
- Use an AND gate when multiple failures must occur together for the top event to happen.
- Use an OR gate when any single failure can independently cause the top event.
- Use transfer symbols to link common failures across different parts of the system.
At this stage, the fault tree serves as a structured map of failure pathways, helping you see how different issues interact within the system.
4. Analyze the Fault Tree
This is an important step. Some scenarios are more likely or more impactful than others and need to be prioritized. Look at the fault tree you’ve developed and determine how likely each failure mode is to happen. You’ll want to assign failure probabilities to each basic event and calculate the overall probability of the top event occurring. You can take out a page from Failure Mode and Effects Analysis for this.
You can also use techniques such as:
- Qualitative analysis – Identifying weak points in the system without numerical probability calculations.
- Quantitative analysis – Using failure rate data and probability calculations to determine the likelihood of the top event.
- Minimal cut sets – Identifying the smallest combinations of failures that would trigger the top event.
Example: If a conveyor belt misalignment has a failure probability of 0.01 and an overheating motor has a probability of 0.05, their combined effect (if they must both fail together under an AND gate) would have a probability of 0.01 × 0.05 = 0.0005 (or 0.05%).
5. Develop Mitigation Strategies
Based on your analysis, you now develop strategies to reduce, eliminate, or control risks associated with the top event. The goal is to break failure pathways and turn OR logic gates to AND logic gates and reduce the probability of critical failures.
Mitigation strategies include:
- Design Improvements – Modifying system components to increase reliability.
- Redundancy and Backup Systems – Adding alternative power sources or backup components.
- Preventive Maintenance – Regular inspections and servicing to detect early signs of failure.
- Training and Standard Operating Procedures (SOPs) – Ensuring that human errors are minimized through proper protocols.
By continuously refining your FTA and updating it with new data, you ensure that your system remains resilient, reducing the chances of failure and improving overall operational efficiency.
Conclusion
A fault tree analysis is a more involved risk assessment and mitigation method. With that being said, it provides more comprehensive results.
When using this method, it’s important to start small to get a better understanding of it and then roll it out to more important systems. This guide has given you a solid foundation to start with. The rest is up to you.
Let me know what you think in the comments, and don’t forget to share.