Root Cause Analysis: How Much Is Enough?
Is Less More?
In the past two months I’ve had two TapRooT® Users say that a manager, not trained in TapRooT®, decided that they were spending too much time finding the root causes of problems. After all, before any investigation, the answers looked obvious to the managers. Why couldn’t the investigators just document what the manager already saw, ask “Why” five times, and come up with a simple, low-cost fix?
Of course, I started thinking about what management needs to know about root cause analysis. How they should realize that their “simple” idea is a recipe for disaster. Answers may be obvious but WRONG. Good answers to the right small problems keep big problems from happening and that’s what keeps their company from having the next Deepwater Horizon, Buncefield, or Three Mile Island. Do they want their company/site to be the next synonym for a disaster?
Then I started thinking …
What is enough?
How much effort is reasonable?
Are we giving management what they are paying for?
How can they tell?
What is Simple?
“The answer should be as simple as possible, but not simpler.”
So, what is the right amount of root cause analysis? That’s a great question!
To answer that question, one must consider the purpose of incident investigations.
The fundamental purpose of an incident investigation is to stop future serious accidents.
The Heinrich Pyramid shows the 300 reasons we investigate “no-injury accidents.” They have the potential to become injury accidents or fatalities. This same theory can be applied to quality incidents, hospital sentinel events, or any other major “bad thing” that is preceded by smaller events that foreshadow the big problem and give you a chance to improve before disaster strikes.
When management doesn’t see the purpose of investigating smaller problems to stop the big ones, they don’t know why anything but the smallest effort is needed.
Some examples might help.
Let’s start with the explosion at the BP Texas City refinery. It killed 15 people, shut down one of the biggest refineries in the US for a year, and resulted in the biggest OSHA fine ever. But it could have been prevented. There were prior incidents. BP could have learned from them. If investigations commensurate with the risk had been performed and effective corrective actions had been implemented, the blowdown drum and stack would have been replaced with an adequately sized drum and flare. Procedures would have been usable and used. Rules for bringing on an extra operator during startups would have been followed. And human factors problems on the control boards would have been corrected. The prior incidents were there. All they needed was management to ask for advanced root cause analysis and effective fixes of smaller incidents to stop the major accident.
What about Three Mile Island? They did not learn from prior incidents. The nuclear industry had become complacent. They believed their own press releases about “high performance organizations.” An accident couldn’t happen at a nuclear plant. Too much defense in depth. The operators didn’t even believe it DURING the meltdown. A meltdown just could not happen. Hubris – an interesting phenomenon.
Then there is the last flight of the Concord. A prior incident had shown the potential for a fuel tank puncture. But instead of learning and preventing an accident (even if it meant grounding the Concord), engineers made calculations that showed the big accident couldn’t happen. Yet, it did. Previous corrective actions had been inadequate. Calculations can’t stop debris.
It Can’t Happen Here
In spite of the dozens of examples of accidents that could have been prevented by applying advanced root cause analysis of prior incidents and near-misses, some management still looks to save money on incident investigations. They must have read that:
A Penny Saved is a Penny Earned
But Ben Franklin also wrote:
Penny Wise and Pound Foolish
A Stitch in Time Saves Nine
An Ounce of Prevention
is Worth a Pound of Cure
Management must be kidding themselves that, “it can’t happen here.”
Your job is NOT to lull management into a sense of security. You must be ever vigilant and keep them aware that accidents are constantly trying to happen. Only a good defense of great root cause analysis of incidents and near-misses (that have the potential to cause major accidents) along with proactive audits & assessments targeted at high-risk activities can stop disasters. That’s why investing in advanced root cause analysis – TapRooT® – is completely cost justified.
Keep ROI High
Once management understands the reason for the investment in root cause analysis, you need to keep reminding them by explaining the accidents you prevent with each corrective action that is implemented. Explain the ROI (return on investment) that comes from preventing major accidents. Keep them aware that continued vigilance is the price of accident-free operations. It never stops.