September 16, 2020 | Mark Paradies

Equipment Failure Analysis [Root Cause Failure Analysis]

Root Cause Failure Analysis, pump

Equipment Failure Analysis/Root Cause Failure Analysis

Why Equipment Continues to Fail

Everyone probably agrees that the failure to find and fix the causes of equipment failures leads to additional failures and reliability issues. So, if people agree, why do we continue to have failures to find and fix the root causes of equipment failures? The answer … poor equipment failure analysis/root cause failure analysis.

That was an issue I explored in the late 1990s. I discussed the problem with some very smart people. One of those was Heinz Bloch, noted equipment reliability expert.

Heinz Bloch Root Cause Failure Analysis Expert

The outcome of our discussion was that many maintenance professionals, maintenance managers, and equipment reliability engineers did not use a systematic approach to troubleshooting and root cause analysis.

What did they do?

  • Replace parts or entire units (trial and error)
  • Guess at answers using their experience
  • Guess at answers based on the advice of others (perhaps vendors)
  • Use substandard techniques like 5-Whys

This leads to a failure to properly troubleshoot the failure and a failure to find the failures root causes.

Without detecting the reasons for and root causes of the failure, the actions taken usually put the equipment back together with the reasons for the failure still in place. Thus, another repeat failure is bound to occur.

Sometimes this has happened so many times that people come to believe that the equipment is just unreliable. After all, it has always been this way. People believe there is no way to improve reliability.

Building an Equipment Failure Analysis/Root Cause Failure Analysis System

Heinz and I didn’t accept that answer that reliability improvement was “too hard.” Heinz knew effective techniques for equipment troubleshooting. And I knew advanced root cause analysis techniques. We decided to put them together into a systematic process to troubleshoot and find the root causes of equipment reliability issues. And System Improvements licensed Heinz Bloch’s troubleshooting techniques to create the system.

The year was 1997. It took us several years of technique development, course development, software development, and testing before we had a proven system. It combined the best of Heinz troubleshooting techniques with the advanced TapRooT® Root Cause Analysis System to create a new way to find and fix the root causes of equipment issues.

The use of this system in the field proved that effective troubleshooting and root cause analysis could improve plant/process performance by improving equipment reliability. But as Admiral Rickover said:

Good ideas are not adopted automaticall. They must be driven into practice with courageous impatience. Once implemented they can be easily overturned or subverted through apathy or lack of follow-up, so continuous effort is required.

Perhaps that’s why even 20 years later there are many who still don’t know about these advanced tools or others who have forgotten to apply what they have learned.

The Process of Equipment Failure Analysis/Root Cause Failure Analysis

There are six basic steps to the process of equipment failure analysis/root cause failure analysis. They are:

Step 1: What Happened

Step 2: Troubleshooting

Step 3: Causal Factors

Step 4: Root Causes

Step 5: Corrective Actions

Step 6: Repairs and Improvement

We combined these into an equipment failure analysis flowchart…

The complete process is outlined in Using Equifactor® Troubleshooting Tools and TapRooT® Root Cause Analysis to Improve Equipment Reliability.

Equipment Failure Analysis

Let’s look at each step and the reason for each step to see why all these steps are needed.

Step 1: What Happened

People often make the mistake of trying to correct problems (corrective actions) or find root causes BEFORE they understand what happened. Why did they do this? Because they thought they already knew the answer. They have “seen this before.” Thus, they thought they didn’t have to ask about what happened.

Unfortunately, they were usually wrong. Why? Because there are many causes and they only knew a few.

So, one of the early things I learned about root cause analysis is:

You need to understand what happened BEFORE you can understand why it happened.

Thus, in all our root cause analysis processes we start with the development of a SnapCharT®, a visual flow chart of what happened. Developing a SnapCharT® helps the person investigating the problem understand what happened before they start their root cause analysis.

Below is an example of an equipment-failure-related SnapCharT®…

I don’t know how many times that I’ve been asked to help with a stalled investigation (they just can’t find the root causes) and I started drawing a SnapCharT® that helped them ask questions that they previously overlooked and found new and important evidence that led to the identification of important root causes that really needed fixing.

Step 2: Troubleshooting

This is where Heinz Bloch’s expertise came in. He had developed a systematic troubleshooting system (a set of tables) that provided a consistent way to understand what was causing the failure of a piece of equipment. Below is the flow chart of how to apply the tables…

Equifactor® Troubleshooting Guide

For the pump SnapCharT® example, the symptom was insufficient capacity. For that symptom, Heinz developed the following table of possible causes…

Troubleshooting Table

Thus, you could develop a troubleshooting procedure to either select or eliminate each of these potential causes and discover the cause of the problem.

In the example, the impeller was discovered to be installed backwards. But before that was discovered, other causes were checked and eliminated before they decided to disassemble the pump to check on.

After the troubleshooting is complete, the information gained is added to the SnapCharT®…

With this knowledge, you are ready to decide, is there something more to learn by performing a root cause analysis OR should we just stop here and perform repairs.

In other words, do we just put the impeller on the right way and rebuild the pump, OR is there something we need to learn about why the pump impeller was installed backwards and how we could prevent it in the future.

If you decide to learn more, you are ready to define the Causal Factors.

Step 3: Causal Factors

We know what the incident is but what were the problems that led to the incident. Those are Causal Factors. We define Causal Factors as:

Causal Factor:
A mistake, error, or failure that directly leads to
(or causes) an Incident (the circle on the SnapCharT®) or
fails to mitigate the consequences of the original error.

For more about defining Causal Factors, see the post about our new Causal Factor Worksheets. There are four types of Causal Facor Worksheets for each of these types of problems:

  1. Equipment failure-related incidents.
  2. Safety-related incidents.
  3. Quality-related incidents.
  4. Patient Safety-related incidents.

Once you identify the Causal Factors, each Causal Factor will be analyzed to find the Causal Factor’s root causes.

Step 4: Root Causes

How to use the TapRooT® Root Cause Tree® Diagram to find the root causes of each of the Causal Factors is covered in previous articles. For a white paper that explains how the Root Cause Tree® works, use the form below.


Step 5: Corrective Actions

Once you know the root causes, the next step is to fix them. To do this you use the Corrective Action Helper® Guide (book) or Module (software). The guide provides suggestions to develop effective corrective actions for each root cause on the Root Cause Tree®. Again, using the Guide to develop corrective actions is described in the white paper above.

Step 6: Repairs and Improvement

The last step is to fix the equipment and implement corrective actions.

What Makes this Equipment Failure Analysis Process Unique?

Four things make this version of Equipment Failure Analysis/Root Cause Failure Analysis unique are:

  • The SnapCharT® for organizing and displaying what happened.
  • Heinz Bloch’s knowledge incorporated into the Equifactor® Troubleshooting Tables.
  • The TapRooT® Root Cause Tree® Diagram for finding the root causes of human performance and equipment related problems.
  • The Corrective Action Helper® Guide/Module for developing effective corrective action.

Without these tools, people are dependant just on their knowledge. Most people don’t have:

  • detailed equipment troubleshooting knowledge
  • detailed human factors knowledge
  • out of the box ideas for correcting problems

Plus, Equifactor® and TapRooT® make the process systematic and repeatable. This is a process you can teach mechanics and engineers and get consistent, reliable results that work to improve equipment reliability.

Finally, the process helps people understand that many problems that are thought to be equipment problems are actually human performance problems. And when they apply the TapRooT® Tools, they can find and fix the root causes of these human performance problems that they previously didn’t recognize.

How Can You Learn This Process for Equipment Failure Analysis/Root Cause Failure Analysis?

First, you could read the book, Using Equifactor® Troubleshooting Tools and TapRooT® Root Cause Analysis to Improve Equipment Reliability. But a better way is to attend the 2-Day Equifactor® Equipment Troubleshooting & TapRooT® Root Cause Analysis Training. Here is the course outline:

Course Outline

DAY ONE

  • Introductions
  • Understanding What Happened – SnapCharT® Basics
  • SnapCharT® Exercise
  • Collecting Information
  • Failure Modes and Failure Agents
  • Equifactor® Troubleshooting Tables
  • Human Errors
  • Process Troubleshooting
  • Example: Troubleshooting a Seawater Pump

DAY TWO

  • TapRooT®/Equifactor® Software Introduction
  • Identifying Causal Factors
  • Introduction to the Root Cause Tree®
  • Exercise: Walking Through the Root Cause Tree®
  • Exercise: Teams Find Root Causes Using the Root Cause Tree®
  • Change Analysis
  • Change Analysis Exercise
  • Final Exercise: Solving a Major Equipment Issue

The course materials include the book, Using Equifactor® Troubleshooting Tools and TapRooT® Root Cause Analysis to Improve Equipment Reliability, a TapRooT® Root Cause Tree®, the Root Cause Tree® Dictionary, and the Corrective Action Helper® Guide, a $129 value. The book includes a complete set of Heinz Bloch’s troubleshooting tables.

Upon completion of the course, attendees will receive a certificate of completion and a 90-day subscription to TapRooT® VI Software, our dynamic cloud-based software that computerizes the Equifactor® and TapRooT® Techniques.

Here’s a link to the dates and locations of upcoming public courses:

Or you can sponsor a course at your site to for your troubleshooting and equipment failure analysis experts. Call us at 865-539-2139 for more information. Or CLICK HERE to contact us.

Equipment Failure Analysis Course

But don’t wait. Unreliable equipment is expensive in manufacturing processes. Every day you delay is thousands of dollars wasted.

Categories
Equipment Reliability / Equifactor®
Show Comments

Leave a Reply

Your email address will not be published. Required fields are marked *