August 12, 2015 | Barb Carr

What, Why and then Fix… There is No Other Sequence for Root Cause Analysis

Heard the quote above in a movie years ago when a group of scientists created a super computer to solve the “Ultimate Question of Life.” The super computer’s “ultimate answer” was “42”. This answer meant nothing to the community because they did not know the ultimate question for the ultimate answer.

When it comes to effective root cause analysis and problem solving, are you jumping to the “ultimate why” or the “ultimate fix” without truly knowing the “ultimate what” behind the problem?

It is not how many questions you ask or even how many solutions that you throw at a problem; instead, it is how you define the scope of your problem that needs to be solved, what you learn when you find out what happened during the problem’s occurrence, what you ask based on what you learn and how you fix what you find out.

The sequence of what happened, why “the what” happened and then fixing what you find for good problem solving sounds simple, right? Then why do so many people not follow this critical sequence of problem solving? A personal experience comes to mind from a recent investigation failure that I observed. Note that you should always start with defining what the problem is that needs to be analyzed before you start a root cause analysis.

The problem scope of the investigation failure mentioned above was to understand why there was a repeat of an incident after a team had completed their incident investigation and implemented their created corrective actions.

What are the probable costs of not analyzing an incident?

1. Hazards to people, equipment, processes or a customer not identified and therefore will not be removed, isolated or mitigated.

2. The next associated incident has a worst outcome:

a. A loss of life, injury or other harm to people
b. Damage to the environment
c. Equipment run to unplanned failure
d. Loss of a process or production system
e. A loss of client from repeat defects and failures
f. Government or other independent Agency involvement

3. A backlog of incidents and rework of incidents that includes a backlog of corrective actions.

Below are some of the facts that I collected for the repeat incident failure that I observed:

1. The investigation team had a natural tendency to take shortcuts by using experienced-based guessing to reduce investigation time.

If you already “know” the whys of a problem or you know the solution that you want to implement, then you do not need to verify what happened.

This team’s shortcuts then became “longcuts” due to guessing and expert driven tunnel vision that led them into erroneously based evidence collection and why selections. This error ended in wasted time and poor corrective actions that did not lesson or mitigate the problems that caused the original incident.

2. Poor problem solving skills for many of the team were taught previously in “well meaning” problem solving training… 5-Why’s, Ishikawa Diagrams and Brainstorming Solutions.

Items one and two above support each other and are easily adapted by expertise driven problem solvers. Just call these factors above co-enablers. These methods tend to feel good because they support your own experiences and they are quick and easy tools to learn and use. These tools assume that all right experts are sitting in the room, all the right people went out to look at the problem and no guesses or assumptions were made. Not the case on most situations during problem solving.

A good root cause analysis process does not replace the need for a company’s process experts, workers or managers. It instead should pull good information from these people in an unbiased and effective manner. It should also ensure good corrective actions are developed, implemented, verified and validated.

The problems identified above encouraged the company’s problem solvers to deviate from an effective problem solving sequence of what, why and then fix during root cause analysis, which caused this team’s incident to repeat.

So what happens when investigators follow the “Ask Why First” method instead of trying to learn what happened first?

1. The investigators tend to pull from their own experiences first and quickly try to fit their experiences to the problem being analyzed. This is the first stage of failure called guessing. Never assume what happened is the same as to what really occurred during a problem. Also, if you never experienced the problem before, you will have no experience to fit the problem to.

2. Investigators often throw multiple “possible” root cause options at a previously “known” problem. The more causes the merrier, right?

Actually no. For every cause you throw at a problem not based on facts of the incident, you now have to take time to collect information, causing you to waste time. Often you choose which cause is the most important to you before you know the facts and then ignore collecting any other “unimportant” information.

3. Depending on the previous problem solving training received, investigators often drive the evidence collection with linear brainstorming why questions (5 Whys style).

You increase the probability of delaying, if not actually ignoring, viable evidence. This process also tends to let you drive to find just one “real root cause”. This problem is a critical error. After all, even a fire, like any other problem that you may investigate, has more than one ingredient and cause. This can also produce “tunnel vision” designed to find the “most important” or “rootiest root cause”.

Let’s look at the “Fix the Problem First” method. Many well-meaning problem solving methods state that solving the problem is more important than finding all the whys or what’s of the problem that needs to be resolved. Management doesn’t care how you fix the problem as long as you solve it, right? What could go wrong if we just try to brainstorm a solution first and by-pass the whole finding a cause thing?

1. The focus of the investigation tends to be for the investigators to quickly put things back to normal, to stabilize the environment for damage control. This is not problem solving in reality, it is actually called triage. Triage is where you quickly assess the issue, make a best solution guess and then put that guess into action. Reduction of time to solution is vital in triage.

There is a need for triage with immediate actions, however this should not be practiced during good problem solving because it becomes a “Broke-Fix” mentality as opposed to understanding the problem to improve preventing the problem from occurring again.

2. If you have a fix in mind, you have an agenda. This agenda looks for supporting evidence to validate the selected fix and also tends to filter out other important issues.

The level of your organization chart that is driving the solution during this process can also set the stage for what is acceptable for the investigators at that site to discuss and address at the employee level. This often restricts getting all the facts and restricts what is allowed to be changed.

So how does starting with “What happened first” during problem solving prevent the issues listed above?

1. Identifying what happened before the problem that needs to be resolved occurred and what happened after it occurred, with proper detail and supporting evidence, reduces the case for assumption led decisions.

2. Writing down what happened, increases the ability to identify more clearly the conflicting statements from interviews and gaps in a process being investigated.

3. Writing down what happened, allows you to identify what worked right. This helps validate good processes and demonstrates that you’re using a root cause analysis process that looks for the good, the bad and the missing best practices. This is good for morale and increases the probability for effective and sustaining corrective actions.

4. You now have good documentation to help you find out why the problem that needs to be resolved occurred and why the fix is justified. This documentation can reduce the amount of corrective actions rejected by managers and regulators.

5. You are now using a good root cause process to not only figure out why the problem occurred but what also why actions or inactions failed to mitigate the problem or made it worse.

6. At the end of the day your initial gut feeling of what happened, why it happened and how to fix it is either substantiated or rejected based on facts and not emotions.

The sequence of What, Why and then Fix… There is No Other Sequence for good Root Cause Analysis.

For extra credit after reading this TapRooT blog article, let me know what movie the ultimate answer “42” came from and what the question really was for the answer.

You can also join me to learn more about effective TapRooT® Root Cause Analysis by attending one of my classes. We can talk about the movie over coffee or a soda and make a SnapCharT® for why the world was going get destroyed for a new galactic highway.