CNC invited System Improvements, Inc. back to Trinidad to teach a 1-Day Refresher and 2-Day Root Cause Analysis Courses this week. It was pleasure for Mark Olson and I to work with their energetic and passionate team.
Here are a few picture from the 2-Day TapRooT® Incident Investigation and Root Cause Analysis:
USA Today published an article about the recent crash that killed ex-Senator Ted Stevens and the hazards of flying to remote locations in Alaska.
The story mentioned several reasons for improving safety in Alaska but missed one. What is the one they missed? TapRooT®.
Back in 2002 we licensed The Medallion Foundation to teachTapRooT® and use it to investigate aviation accident in Alaska.
Then in 2003, we licensed the FAA in Alaska to use TapRooT® for accident investigations.
Now they cooperate in their investigative efforts to improve aviation safety in Alaska.
How has TapRooT® Helped improve Alaska aviation safety? Attend the TapRooT® Summit and find out. Dennis Ward, Executive Director of the Medallion Foundation and a certified TapRooT® Instructor, will present “Improving Performance by Analyzing Multiple Aviation Accidents for Common Causes” in the Investigation, Troubleshooting, and Root Cause Analysis Track. His talk explains the use of TapRooT® to find deeper meaning from the analysis of multiple accidents.
This is part of The Medallion Foundation’s efforts to improve the safety culture of the aviation industry in Alaska. Their web site has the following information:
“The Medallion Foundation is a non-profit organization promoting aviation safety through systems enhancements by providing management resources, training, and support to the aviation community. Our mission of reducing aviation accidents is fostered by research, analysis, education, auditing, and advocacy of Safety Management Systems and higher flight-training standards.”
It also says:
“The Medallion Foundation provides specific training classes, one-on-one company mentoring, and auditing in conjunction with and supplemental to the Five-Star / Shield programs. Courses such as System Safety, Safety Officer, Flight Risk Management, and TapRooT® Root Cause Analysis are offered as prerequisites for the Star Programs.”
OK … I added the emphasis on TapRooT®. But hearing how Dennis used TapRooT® to find significant Generic Causes of accidents from their root cause analysis, will help you understand why I put emphasis on using TapRooT® as a fundamental part of any improvement program.
One of the biggest trends in quality improvement was the term “The Cost of Poor Quality” tied with “Zero Defects”, with many COPQ financial models popping up in many Fortune 500 companies. In the safety world there was a similar drive with the term Cost of Compensation tied with “Zero Injuries” and OSHA driven recordables to be tracked.
The Quality Iceberg
The Safety Iceberg
Yet the focus for both safety and quality were lead by lagging visible indicators. In other words good or bad, the findings are just too late. You march your troops with the “Zero Defects” and “Zero Injuries” flags raised and once you reach your destination you turn around and see who and what equipment you have left.
Now don’t get me wrong, identifying and being able to comprehend the end damage is a vital part of the process and unfortunately not realized by some. It is just NOT where you should focus your drive and effort.
So what now you may ask? “Build quality in… do not inspect quality in!”
The phrase above often goes to deaf ears because it is misunderstand. “If you do not assess the quality of your work, then how do you know if it is to standards,” people would ask. “I have to trust everybody’s work?” In the safety world the phrase “Safety must be part of every action we do,” is often trumpeted. But how?!
Start with these 3 steps first:
1. First things first, Quality and Safety are NOT silo’s and they should work together. Setting up a task that can be worked efficiently, correctly and safely by employees is a combined goal and SHOULD NOT be competing goals.
To save money, many companies do not cross-train employee’s from different departments. Why not if it makes sense? For example, while many of our clients started using TapRooT® Root Cause Analysis in their safety departments first, the more people saw the process used, the more operations and facilities come onboard for the same training.
Now this cross-training concept also works in the opposite direction. As the quality department leaders started working with the safety, quality tools from Stakeholder Analysis to Force Field Analysis were also shared with the safety department. After all, inside all world class companies are different departments that are all part of the same company with one goal.
2. Building Quality and Safety into a process starts in the beginning stages of planning but can be recovered after the employees try to use an existing process (it just costs more time and money!).
When our clients use our Root Cause Analysis process to investigate defects and incidents it soon becomes apparent that the opposite of each one of our root causes are best practices that can be implemented proactively.
While most Quality Experts are excellent at mapping out front end value streams, process maps and spaghetti maps, there is often a gap in knowledge of research and industry best practices in human engineering, communication, procedures, training and work direction. So if you were a Quality Professional and had access to multiple experts in front of you everyday, would you utilize them? Here is small list of courses that can give you best practice access: Best Practice Courses
3. No process, no matter how well designed is perpetually stable and it must be audited/assessed periodically based on risk for unknown and known changes…. note: this is not the same thing as “inspecting in quality”!
This is one of the most misunderstood ingredients relating to Inspections.
If you have a hold point inspection that must be completed by an Independent Inspector BEFORE a task can be completed or a part received or shipped, you are admitting that you have a high risk potential that is not capable of being completely mistake proofed.
– OR-
You have a process or task where you have not truly identified the human and equipment behaviors with their associated Root Causes, and have decided that it is worth spending the extra money and time to inspect instead of fixing the problem. You refuse to build in quality.
Now this is not saying that you should not target high risk tasks proactively and continually audit or assess these areas to ensure nothing has changed or is different. This type of inspection must still occur.
“China rushed to keep an oil spill from reaching international waters Tuesday, while an environmental group tried to assess if the country’s largest reported spill was worse than has been disclosed.
Crude oil started pouring into the Yellow Sea off a busy northeastern port after a pipeline exploded late last week, sparking a massive 15-hour fire. The government says the slick has spread across a 70-square-mile (180-square-kilometer) stretch of ocean.”
TQM, TQC, TOC, PDCA, Six Sigma, Lean, Lean Sigma, MBO, 8D… just to mention a few Quality Programs many in the world of Quality have been exposed to…. but is it the name of or the effectiveness of the process that make a good Quality Improvement Program? Seems like a silly question until you have lived in the world of change.
…..”Six Sigma is not the same as TQM”
…. “Lean Six Sigma is definitely better than Six Sigma”
…. “Is it a Lean Project or a Six Sigma Project?”
Each new buzz was normally preceded by a period of frustration, low morale and a loss money followed by blame or a feeling of hopelessness. Often employee’s were also taught the term of “empowerment” which led to suggestions with no follow up by management. Each time a new process with a new name was introduced, we would “throw the baby out with the bath water.” So a new name was also perceived by many as reinventing the wheel in the name of rebuilding an Effective Quality Program.
So why reinvent the wheel? Why not forgot the name, identify the strengths and weaknesses of your current quality program processes and improve what really needs to be improved. This is the proper way to spend your money and time for the best return on investment and acceptance of your employees.
So the burning platform, pain and frustration felt by many in charge of ensuring quality processes sustain, is still a current issue addressed by many professionals that I met at ASQ World Conference this year. They were not arguing on whether it was 8D or Lean Six Sigma. The good thing is that many are realizing that numerous tools and processes previously divided into opposing teams can be combined without a large new program investment.
With that said one area of common interest by many at the ASQ Conference was Root Cause Analysis. The interest was not in how to calculate significance or sigma level because most there could calculate these with their eyes closed. The interest was in how to reduce bias, widen root cause perspectives and to add more qualitative substance behind the numbers. There were two Root Cause Booths at the conference….. guess whose booth had the most traffic, the TapRooT® Booth where we were able to share a portion of our process that could easily be combined with all the current processes listed above to gain more value and quality sustainability.
Every other week on this blog, I will dig a little deeper into current Quality Program frustrations. To help guide these posts to your quality needs, please chime in and post your issue of the week.
The newly developed SFDA (Saudi Food and Drug Authority) located in Riyadh, Saudi Arabia, has taken the lead in medical oversight of conformity; not only by creating a Medical Devices Sector, but also by ensuring that their Medical Device team has a thorough understanding of human error and equipment failure and has the best tool to investigate it with, TapRooT® Root Cause Analysis.
Here are few pictures taken during the onsite 2-Day TapRooT® Incident Investigation and Root Cause Analysis, 1-Day TapRooT®/Equifactor® Equipment Troubleshooting & Root Cause Failure Analysis, Stopping Human Error, and 1-Day Evidence Gathering Courses held in June.
If you look closely you can see that they are using the new individual software… (another user test to make sure it is ready to go out to all users)
“In the five years before the Deepwater Horizon exploded, federal investigators documented nearly 200 safety and environmental violations in accidents on platforms and rigs in the Gulf of Mexico, describing a stunning array of hazards that resulted in few penalties.
Workers plunged dozens of feet through open unmarked holes. Welding sparked flash fires. Overloaded cranes dropped heavy loads that smashed equipment and pinned workers. Oil and drilling mud fouled Gulf waters. Compressors exploded. Wells blew out.
And yet, in their investigations of nearly 400 offshore incidents, Minerals Management Service officials failed to travel to one-third of the accident scenes, collected only 16 fines and did not investigate every blowout as their own rules require.”
Do you think that lack regulation contributed to the BP/Transocean Deepwater Horizon blowout?
We have had plenty of time for comments and I posted my idea yesterday … So here is a link to the blog post and comments so that you can catch up with the debate:
The details of this disappearing data will be interesting because it is a crucial part of the sequence of events and critical to understanding the decisions that were made.
[Comment From Greg Hellman, BNAGreg Hellman, BNA: ]
OSHA has placed an injury and illness prevention program rule on its agenda for the first time. Could such a rule address musculoskeletal disorders in some way?
Monday April 26, 2010 1:27 Greg Hellman, BNA
1:27
David (OSHA):
The i2p2 standard is not a substitute for other OSHA standards. It provides a mechanism to achieve the culture change needed in this country to effectively address workplace safety and health issues. It will be the employer’s responsibility to identify all hazards in their workplace, which may include ergonomics, falls, amputations, electrocutions, work-related respiratory disease (such as occupational asthma), etc. The i2p2 standard simply provides a mechanism for employers to identify hazards; however, the control of those hazards will be required by existing OSHA standards and the general duty clause, as is currently the case.
I understand Teen (Under 18) OSHA restrictions for high risk industry jobs and tasks but should the safeguards and other corrective actions be any less just because an adult is performing the task?
Review the case studies here: http://www.osha.gov/SLTC/teenworkers/realstories.html
While reviewing, replace the word Teen with Adult.
The problem I saw is that many of the failed or missing safeguards could have killed anyone, teen or not. Do teens, especially in fast food restaurants, get exposed to more uncontrolled hazards because teens will do anything they are asked to do without the training or experience to know better?
Often one dealing with many industries hears this phase…
“Risk is just part of the job! It’s not going to get done unless someone does it!”
Here a couple of excerpts from recent article in the New York Times comparing two mines 200 miles apart.
“Coal mining carries inherent risks. But the numerous and very public violations and fatalities at Massey-owned mines over the years may leave the impression that all mines are run this way — that all mines leave coal shafts open and fail to exhaust methane properly. They do not. A comparison between Massey’s safety practices and those of other operators in the coal industry shows sharp differences, helping to explain why Massey mines led the list of those warned by federal regulators that they could face greater scrutiny because of their many violations.”
“A unit of the TECO Coal Corporation operates a mine with the all-business name of E3-1. Like Upper Big Branch, it is nonunion. It has fewer employees, produces three-quarters the amount of bituminous coal, uses an arguably riskier method of mining — and, its operators say, emits 25 percent more methane a day.”
“The differences in safety practices between TECO and Massey are often stark. Where TECO workers rigorously inspect the mine for safety problems before every shift, Upper Big Branch has had dozens of violations related to pre-shift examinations, some for failing to conduct them at all, others for not documenting that they had been done. All TECO miners get weeks of safety training, but in September an inspector ordered dozens of Massey miners out of Upper Big Branch because they lacked proper training.”
“Several years ago, TECO fired a mine foreman for failing to rehang a ventilation curtain that had fallen to the mine floor and contributed to a fire. At Upper Big Branch, inspectors more than once found curtains improperly hung or lying on the mine floor, a practice workers said was routine and encouraged because the plastic sheets get in the way of equipment.”
Finally this quote…
“Many of the miners suspected they knew a major source of the gas buildup: a coal shaft, unused for years, that passed down through several old mines before reaching theirs. According to a longtime foreman at the mine, who provided previously undisclosed details of its operation, the shaft was never properly sealed to prevent the methane above from being sucked into Upper Big Branch.
Instead, the foreman said, rags and garbage were used to create a poor man’s sealant, which he said allowed methane to permeate the mine, displacing much-needed oxygen.” (more…)
The title of this Article could not be any further from the truth, but I hear it all the time in online root cause forums.
…… Yes, it is too late to prevent an incident that has already occurred.
….. Yes, after an incident we want to find the root causes that allowed a human or equipment behavior to occur or get worse.
….. Yes, we build corrective actions based on the incident analysis to prevent another incident from repeating….. What?
So you mean applying best practices (the opposite of the root causes found) will prevent future incidents and forces you to look forward to make sure they worked?
So where else can I apply root causes (absence of best practices)?
….. Designing processes and products. (Why create future root causes for someone else to figure out by accident?)
….. Setting up and assessing your training and hiring program. (So is it a Worker Selection, Supervision, and/or maybe a Training Root Cause?)
….. Revamping your audit or assessment program. (Where are the human factors questions in your audit?)
This important concept is actually why we teach proactive root cause analysis to our clients.., to learn more; e-mail: info@taproot.com and tell them Chris has a root cause to pick with you.
It got me wondering … “Can the choice of a root cause analysis tool increase or decrease the amount of blame in an investigation?”
I’ve thought about this a lot in the past – so I have some definite ideas. But I thought I’d post the question here to see what others have to say. Then I’ll chime in later.
Couldn’t decide if this was a Monday Accident or a Friday Joke. But when people are pulled from the wreckage by firefighters, it’s probably not a joke!
The Associate Press reported that a Piper Cherokee ran out of gas. The pilot tried to land on a county highway and didn’t quite make it (not successful landing).
As they say … Any landing you can walk away from is a successful landing.
In this case the pilot and the passenger were taken away by helicopter ambulance and an ambulance. Thus this was NOT a successful landing.
It seems that having enough gas to reach your destination would be part of a pre-flight checklist. If I were investigating this accident, I would be looking for the mistakes that were made that caused the gas to run out (calculation errors, just didn’t look, thought someone else was taking care of the gas, inaccurate readings, wrong fuel tank lined up, unusual fuel usage not noticed, …).
What can you learn from this accident? Even simple accidents need root cause analysis. Just because the pilot ran out of gas doesn’t mean that we know the root cause (pilot error anyone?). There is a lot of investigating before we are ready to say why the number of successful landings ≠ the number of successful take-offs.
“Failure to test a cement casing at an oil well in the Timor Sea was a root cause of a blowout that caused Australia’s worst offshore oil spill, an inquiry has heard.“
It sure seems like there were many more “root causes” to me and that the analysis should have led to root causes that were much more in-depth. And it would be a big help if there was a SnapCharT® to help identify all of the Causal Factors.
While performing your PROACTIVE TapRooT® Root Cause Analysis, you observe a person loading a pallet with 10′ L x 6″ dia. 30 pound metal pipes by himself. He lifts 30 pipes an hour 3 times a day from a rack waist high to a pallet placed on timbers floor level. This task used to be performed by two loaders before recent lay offs, so you go to the Root Cause category of Excessive Lifting and see these two questions in the Root Cause Tree Dictionary:
* Was the issue related to excessive lifting or force to move an object?
* Did the task require repetitive motion (lifting, twisting, bending, etc.) that lead to a musculoskeletal problem?
Since this is a Proactive Assessment there are no issues yet, so you are asking what is the worst issue that could occur by the lifting movements above? Now what does excessive mean? What would excessive lifting, twisting and bending be? We could bring in an external Ergonomic Expert… or can we use a simple calculation ourselves first?
NIOSH 1991 Lifting Calculator. Centers for Disease Control and Prevention (CDC), National Institute of Occupational Safety and Health (NIOSH), 208 KB ZIP*.
As you start doing these calculations, you should also see another Root Cause under Human Engineering start becoming very apparent: Arrangement / placement.
A question that comes to mind from the Root Cause Dictionary is:
* Did poor arrangement, placement, or situation of equipment, displays, or controls contribute to an issue?
So with these new found calculators and a better understanding of just a little bit of the Root Cause Tree Dictionary is this task a risk or not:
” You observe a person loading a pallet with 10′ L x 6″ dia. 30 pound metal pipes by himself. This task used to be performed by two loaders before recent lay offs.”
The Associated Press reported that Chief Electrician’s Mate John G. Conyers suffered a severe electrical shock and was later pronounced dead at Sharp Coronado Hospital.
The AP reported that the Chief was conducting “routine work” when he was killed.
Normally, Chiefs are supervising, not performing, work. And there is nothing “routine” about working with electricity aboard a ship. Complacency (routine) with electricity on a ship is a deadly combination.
One of my early shipboard jobs in the Navy was being the Electrical Division Officer aboard USS Arkansas (a nuclear powered cruiser). One of the first “performance improvement” programs I ever attempted was to re-instill respect for electricity and get 100% compliance with our lock-out/tag-out program to isolate and check dead all sources of voltage during electrical maintenance work.
People who work with any hazard (for example, electricity), tend to become complacent over time. I’m not sure if this happened on the USS Ronald Reagan, but it certainly is a problem that every manager/supervisor who supervises people who work with a hazard has to confront head-on.
Also, supervisors can frequently be tempted to do work and even take shortcuts to get a job done. This takes them out of their roll to supervise a job and make sure it is done safely and puts them into a dangerous situation where no one is looking over their shoulder to make sure the job is done safely. Once again, I have no evidence that this happened aboard the USS Ronald Reagan, but I’ll be interested in what the eventual accident report has to say.
What can we learn from this fatality BEFORE the investigation is even completed?
First, TapRooT® Users would be getting a complete picture of WHAT happened before they started analyzing WHY it happened. As you can see from my background, there are several problems that I would automatically look for. But, TapRooT® requires the investigator to look at the evidence first before starting the root cause analysis. They have to have a good, complete, accurate, detailed SnapCharT® before they identify the accident’s Causal Factors and find each Causal Factor’s root causes.
Second, TapRooT® Users have a systematic root cause analysis technique, called the Root Cause Tree®, that helps them be sure to check for the many different potential root causes of a problem (Causal Factor). The tree helps guide them to areas they may not have thought of to investigate before. It helps the investigator get beyond blame to find real, fixable root causes that, when fixed, can prevent future accidents.
Third, once the root causes are identified, TapRooT® has a module called the Corrective Action Helper® that helps the investigator develop effective corrective actions. This helps the investigator and management develop corrective actions that might be “outside the box” as far as their experience with corrective actions is concerned.
If you are a TapRooT® User, you have already learned these lessons (but it is good to have them reinforced).
If you are NOT a TapRooT® User, get to a TapRooT® Course NOW! Investigating smaller accidents, incidents, and near misses, as well as using the TapRooT® techniques proactively, can help you avoid major accidents and keep your employees safe.
For more TapRooT® information, including success stories from TapRooT® users, see:
Part 2, as promised, is a discussion on our TapRooT® Users and Friends LinkedIn Group. This begins with a question asked by Jason Laws, a plant manager and client. Join us if you want to get into this conversation or even just to contact Jason directly.
“Common Sense, the Root Cause Tree and a perceived recent lack in the up and coming work force that I have noticed”
My Production Supervisor asked me the other day if there was a place in the root cause tree for Common Sense. I actually said, I didn’t think so. That when we come across “a common sense” causal factor the root causes are usually identified in a Management Systems, Training, and Procedures…. I may really be wrong there….I hate to think it would be in work direction and I am running into more and more unqualified candidates.
Where I have struggled recently is with this very idea. Some things, it would never have occurred to me that we would need to drill training down to that level.
(It was common to police up your work site at the end of a job. When cutting you always cut away, use the right tool for the right job, there is very little in the world that is fit to bang on other than nails, use a chalk line and plumb bob to put up a line of pipe supports, place the labels on the totes level and neatly, check the breaker when the pump won’t start, ….These are just the ones that have come to mind but the list continues.) [ I don't put in don't dead head or run a pump dry. I've been doing this too long to expect that.]
That does bring me to one point I have tried. That is the Poke Yoke or “Error Proof” things. All pumps go in with a Power Monitor shut off now. You can’t run it dry or dead head it.
Still, I am with my Production Supervisor…and have had the same conversation with my Maintenance Director. Is there a place for Common Sense in the root cause tree? Am I the only one? Is the work force changing? Has Nintendo killed the opportunity to get the basic knowledge I and others did with chores, play, hobbies and jobs when were young? If so, what can be done? If the answer is drill spac, training and procedures deeper down into the core knowledge, how do you know how far and how to you identify knowledge that you take for granted that really isn’t.
Sorry, if that was a bit of a ramble, but the Production Supervisor really got me curious.
ah…back to the when I was young, I walked up hill to and from work and pushed double the product you youngin’s push out and with no mistakes!
First off Jason you are right, many of the new employees of today have different skills sets than us old folks…. of course they would tell us it was “common sense” not to upgrade your software with out….etc… AFTER we locked up our computer. After all, didn’t we know this was not compatible for this computer.. duh!
At the same time the craftsman-apprentice relationship from years back no longer exists in many industries. Often it is the junior employee training the junior employee. The senior experienced employee is too busy fixing things to train anyone and often retires without documenting what s/he knows from experience.
The thought that any worker selection process, training process, and mistake-proofing remain stable and does not need to be flexible is a myth. Look at job descriptions, many are outdated, impacting the hiring process and training process.
First attack at the problem:
1. Identify the core skills needed by the employee to perform the core critical tasks for her/his job. Look up AMOD/ DACUM
2. Identify where the employees actually get the needed training. Often training programs get stuck looking at just missed appointments and regulatory required training, thus losing contact with the how the training impacts operations. (Where did the senior workers get their knowledge?)
3. Review the employee’s supervisor’s skill’s and training as well. Often new managers are hired based on needing to have a degree but never get the technical training listed above. The employee then asks the supervisor is this good enough…. how would s/he know?
4. If the training program is outdated (or just broke), then temporarily bring in a knowledgeable mechanic that has a retired and let them help revamp the new program with hands on training.
So if the employee needs a mechanical aptitude to perform certain jobs, then why was s/he not tested prior to hiring? After all, what happened to the unskilled in years past if s/he could not meet the aptitude need? S/he was either trained or kicked out the door.
After all, if common sense where the answer, you would not need the root cause tree either. So GOAL (go out and look) to find what the core skills and tasks are and then ensure that these requirements are met. Also see what you can learn from the new employees as well.
Posted 1 month ago | Delete comment
Response from: Kenneth Reed, Senior Associate and TapRooT® Instructor
You’re right, Jason. There is no Root Cause labeled “common sense NI” anywhere on the Root Cause Tree®. Just like there is no “attention to detail NI” or “operator error.” Although they initially seem like root causes, in reality they are just a convenient way to shift blame.
For example, if I told you the Root Cause was “common sense NI,” what would be your Corrective Action? How do you fix “common sense?” You can’t! Just like you can’t fix “inattention to detail” or ” operator error.” Therefore, we would default to poor Corrective Actions like, “Counsel the employee on using common sense when using a knife.” Completely useless Corrective Action, with almost no hope for better performance.
Instead, we need to look a little deeper at the problem. This is what Chris was alluding to above. Why did the operator slice his hand open? Was it really just a common sense problem? Or is there something we as management can do to prevent this issue?
That’s where the 15 questions, the Dictionary®, and the Root Cause Tree® come in. We need to ask ourselves the questions on the tree to dig deep enough into the problem. Instead of asking, “why didn’t this guy use common sense when cutting that wire, and cut away from himself?”, maybe we should ask:
- Was the worker fatigued, impaired, upset, bored, distracted, or overwhelmed?
- Was he using the right tool? Did we provide him with the right tool?
- Was the right person performing this job?
- Was this job really required in the first place?
- Do supervisors ever watch their people do this particular job? Why not?
- Would a supervisor have stopped this evolution before an injury occurred? If so, why didn’t he? If not, why not?
- Was the worker properly trained for this task?
- since I’m sure the worker did not intend to cut himself, what lead him to think doing the job in this manner was OK?
I could go on, but you get the point. When you find yourself saying, “This was just a dumb person, not using common sense, just a simple human error that I have no control over,” it’s time to step back and let the system work for you. Let the Root Cause Tree® and Dictionary® help you ask the right questions.
I also know that sometimes we think that people should already know these things. There are 2 possibilities:
1. The person really didn’t know (to cut away from himself)
- Therefore, this is a training issue
2. The person DID know, but chose to do it anyway.
- This is when my discussion above comes into play.
Hope this helps a little.
Posted 1 month ago | Reply Privately | Delete comment
Response from Jason:
Thanks Chris and Ken. One thing I have been trying to do, and encouraging my people to do (though finding the resources is always the challenge) is to use TapRooT® in audit mode.
I have worked the tree through these issues and developed corrective actions to account….mainly training, human engineering and Management systems.
My frustration can come from I just haven’t seen or anticipated the lack of knowledge in the first place to head it off at the pass. I am not even sure some of these issues would have occurred to me if I was putting together an audit SnapChart®.
Thinking on this thread, maybe the broader use of CHAPs might catch some of this. In a resource starved environment, I am trying to bring the tools I have to the best and most efficient use.
So, with GOAL. Maybe an Audit SnapChart®, the 15 questions, a CHAP and the Dictionary® I prevent some of these.
The struggle that remains is to overcome the blind spot of assumptive experience and figure out what needs to be trained for in the first place. What are the things we take for granted that really aren’t.
Once again. Thanks guys. I appreciate the feedback.
Posted 1 month ago | Reply Privately | Delete comment
Music to my ears Jason…. “proactive CHAP”. When people are first introduced to Critical Human Action Profile, they look for critical steps in a task that if skipped, done wrong, or in the wrong sequence, could have caused the incident or made it worse. A proactive audit can look for steps that are critical to safety and process.
As far as the “blind spot for assumptive experience”, this is a generic issue as you have described it. So what system should be controlling the hazard of having unskilled employees on the shop floor (or in the field)?
Steps of the process:
1. Company or Contractor Human Resources hire employees that have the skills and capabilities to perform their assigned core tasks.
Problem: Metrics that HR are usually measured by for the hiring process are retention and number of new employees. No tie made to direct labor and rework.
2. Training department has a structured training program that uses classroom and hand’s on training for the cores tasks (process and regulatory).
Problem: Training is often measured by Number of missed appointments and upkeep of regulatory training. No tie made to direct labor and rework costs.
3. Shops have floating experts identified for employees who need a little help.
Problem: The new are training the new. The senior employees are too busy to.
So ask your HR department and your training department, how do they know that they have been successful when hiring and training a person? Most likely it will not be tied to operations ROI. .
Have senior employees attend training with new employees to help all do right.
Look at your critical job’s and tasks to determine what skills and capabilities should be covered for each person and then use GOAL to identify what is missing.
Issue: Employees did not receive their pay stubs on pay day.
·Why? Because the printing system failed the day before pay day.
·Why? Because the system could not recover from a hardware fault.
·Why? Because the system uses outdated hardware that has no automatic redundant backup.
·Why? Because the system hasn’t been replaced as it hasn’t been identified as a high enough priority to allocate budget to its replacement in the current economic climate.
·Why? Because the organization does not have an enterprise planning methodology that weighs the risks of current operational systems failing versus the criticality of these systems and the impact of such a failure.
In 2010, I would like to share lessons learned from TapRooT® User’s investigations.
Do you have an investigation of an accident, incident, near-miss, equipment problem, or quality issue that you investigated using TapRooT® and you would be willing to share lessons learned that you think apply outside your company, please send me the incident and your description of the lessons learned.
Sharing your generic lessons learned is a great way to help save lives across industries and around-the-world.
If you are concerned about legal issues, these investigations can be de-identified (no company information) so that there is nothing to worry about.
Generic lessons learned can be about techniques to perform investigations, new ideas for safeguards, new potential corrective actions, root causes that are common in different industries, or any solutions/lessons that you think might help others.
Yesterday, I posted an article that discussed the advantages of using checklists in the medical profession (see this link). I thought I’d talk a little more about checklists, and how the use of checklists shows up on the Root Cause Tree®.
Let’s look at a reactive incident, where someone made a mistake while performing a common yet labor-intensive evolution. For example, a mechanic was starting up an expensive compressor, and he forgot one step, causing serious damage to the machine. He has done this evolution several times and is familiar with the equipment, but this time, one step out of 16 was missed. This is a very typical example, and your analysis must take into account many different possibilities. Happily, TapRooT® walks you through the analysis to make sure you don’t forget to check everything. The Root Cause Tree® and Dictionary® will have you check many potential problems (fatigue, equipment design, work environment, supervision, etc). However, I’d like to concentrate on:
“When should we expect to have a checklist in place?”
Looking at the Corrective Action Helper® under no procedure, we get some ideas concerning when a step-by-step checklist makes sense.
- First of all, if a new checklist had been in place, would performance have improved? Would this mistake have been prevented? Sometimes, the task is so obvious that having a checklist would not fix anything. If this is the case, don’t write a silly checklist just for the sake of having a checklist. For example, if someone forgets to wear his seatbelt, I highly doubt that putting a checklist in the cab of the truck telling the driver when and how to put on a seatbelt is going to make any difference. This is an obvious evolution, and other corrective actions (audits of seatbelt compliance, proper rewards for wearing seatbelts, consistent enforcement of the rule) will be much more effective.
Additionally, situations where other factors have made it easy for the operator to make the mistake (poorly designed equipment, excessively fatigued workers, etc) probably need these other issues addressed first, and then evaluate whether a procedure would also help.
- The Corrective Action Helper® also states that a checklist makes sense for high risk, high consequence tasks that must be performed correctly every time and require considerable short-term memory. Starting this expensive compressor is an example where checklist use should be considered. Other examples include: a. documentation of the work is required b. extremely infrequent evolutions c. tasks that must be performed under stress, like emergencies (both aircraft engines shut down due to a bird strike?)
What do you do when you have checklists, but people aren’t using them? You are now under the enforcement root cause, and the Corrective Action Helper® has a load of great information on how tackle this problem.
What are your thoughts on checklists? Do you have examples of checklists that helped? What about a checklist that was completely useless at your facility? Let me know what you think!
When can an accident teach us something about investigating accidents? When the accident helps us understand the human brain and it’s limitations.
A story in Wired Magazine titled: “Accept Defeat: The Neuroscience of Screwing Up” explains how scientists often disregard information that conflicts with their “hypothesis” and how this is caused by the way the human brain is wired. I recommend reading the article to better understand this phenomenon.
But how does this relate to accident investigation? Here’s the answer…
Root cause analysis systems based on the theory of cause-and-effect require the investigator to develop a hypothesis and then look for evidence to prove or disprove the hypothesis. The theory of cause-and-effect requires the investigator to already understand the cause-and-effect relationships they are looking for. Thus, they can only find cause-and-effect relationships that they already understand.
However their brain, according to the research in the article, automatically keeps them from seeing evidence counter to their hypothesis or outside their experience.
That is why cause-and-effect root cause analysis techniques frequently have widely different results when used by different individuals looking at similar evidence. Each individual sees the “evidence” the way they want to see it to support their theory of the accident’s cause.
TapRooT® is not built on this cause-and-effect theory. Instead, it is based on unfiltered review of the evidence leading the investigator to develop a detailed explanation of what happened before they start to analyze why it happened. The evidence isn’t collected to verify a hypothesis. Rather, it is collected to expand the investigator’s knowledge and understanding.
Also, instead of depending on the investigator’s knowledge of cause-and-effect, TapRooT® has built-in expert systems to help the investigator see causes that may be beyond their current knowledge of the cause-and-effect relationships of the incident being investigated. These built-in expert systems help the investigator side-step their brain’s built-in simplifying mechanisms and find causes that they might not have originally suspected (or even understood).
Of course, any investigator can stubbornly hold to preconceived notions, but TapRooT® doesn’t fall into the “scientist’s trap” that this article talks about. It naturally helps investigators go beyond their preconceived ideas and previous experience.
That’s an important lesson learned!
If you don’t care about the brain-science behind why TapRooT® works and other root cause analysis techniques fail, that’s OK! Don’t worry … You don’t have to be a neuroscientist to use TapRooT®. We’ll teach you how to use TapRooT® in a 2-Day, 3-Day, or 5-Day Course and then you can take advantage of the advanced science that is invisible to the user but is built into the TapRooT® System.
What?!? You haven’t learned TapRooT®? Then now is the right time to get to a course and experience how TapRooT® can help you find root causes that you previously would have overlooked and develop corrective actions that you and your management will agree are much more effective. Don’t wait! Sign up for a course at:
I can’t help but think that the company comments sound a lot like the comments from BP’s management after the Texas City refinery explosion.
Of course, management is right … Crew errors did lead to the crash. Planes seldom fall from the sky without crew errors. (Although there are notable exceptions.)
The real questions is WHY (what are the root causes) of the crew errors and bad behaviors.
Often, management doesn’t want to talk about these problems because fixing them requires changes in things that management controls:
- Training
- Procedures
- Human Engineering
- Work Direction
- Management Systems
- Communications
- Quality Control
Now for comparing the Colgan Air Crash and the BP Texas City explosion … When you start looking into the details, there are lots of similarities:
- Crew fatigue
- Not following procedures and shortcutting checklists
- Operators/pilots making basic mistakes
- Inexperienced crews responding to the unexpected
Let’s wait for the NTSB report before we jump to conclusions, but I’d recommend that management dig deeper when they look at blame as the cause of an accident.
Perhaps it’s time for Toyota to go beyond the simple root cause analysis of 5-Whys and start using advanced root cause analysis for these more difficult issues (or for all issues).
If you need to learn why 5-Whys should NOT be your preferred root cause analysis tool, see this article:
All high performance systems need root cause analysis. They can use it reactively when things go wrong and proactively to keep things from going wrong.
Long ago we had our first “network reliability” people start using TapRooT® to improve network reliability. Our first customer was a company that supplied high reliability computer system for financial transactions. The next one ran a high reliability telecommunications network that included 911 call systems.
It seems they may be needing some advanced root cause analysis training in London.
Why?
They had to shut down trading on the London Stock Exchange because of a “technical glitch.”
“LONDON: The London Stock Exchange PLC halted trading for three-and-a-half hours on Thursday after a technical glitch prevented some customers from connecting to its systems.
The LSE, Europe’s oldest independent exchange, said taking trade offline was the only way to ensure a fair and orderly market after customers reported the connectivity problems in early trading.
The exchange is still looking into the root cause of the embarrassing outage – the second significant technical problem in just over a year – and said it was too early to judge the extent of the effect on trade or lost business.“
And this isn’t the first time this has happened. The story also said:
“Just over a year ago, the LSE experienced its worst outage in almost a decade when a software glitch was blamed for a 7-hour shutdown that angered customers on one of the busiest days of the year on world equity markets.
On that day in September 2008, the shutdown left many clients unable to cash in on a worldwide stock market boom that followed the U.S. government bailout of mortgage giants Fannie Mae and Freddie Mac.”
Here is my Thanksgiving posting. I post it every year, lest we forget.
In America, today is a day to get together with family and friends and reflect on our blessings – which are many!
One of my ancestors, Peregrine White, was the first child born to the Pilgrims in the New World.
During November of 1620, Peregrine’s mother Susanna, gave birth to him aboard the ship Mayflower anchored in Provincetown Harbor. His father, William, died that winter – a fate shared by about half of the Pilgrim settlers.
The Pilgrims faced death and the uncertainty of a new, little explored land. Why? To establish a place where they could worship freely.
With the help of Native Americans that allied with and befriended them, they learned how to survive in this “New World.” Today, we can be thankful for our freedom because of the sacrifices that these pioneers made to worship god in a way that they chose without government control and persecution.
Another interesting history lesson about the Pilgrims was that they initially decided that all food and land should be shared communally. But after the first year, and almost starving to death, they changed their minds. They decided that each family should be given a plot of land and be able to keep the fruits of their labors. Thus those that worked hardest could, in theory, reap the benefits of their extra labor. There would be no forced redistribution of the bounty.
The result? A much more bountiful harvest that everyone was thankful for. Thus, private property and keeping the fruits of one’s labor lead to increased productivity, a more bountiful harvest, and prosperity.
Is this the root cause of Thanksgiving?
This story of the cause of Thanksgiving bounty is passed down generation to generation in my family. But if you would like more proof, read the words of the first governor of the Plymouth Colony, William Bradford:
“And so assigned to every family a parcel of land, according to the proportion of their number, or that end, only for present use (but made no division for inheritance) and ranged all boys and youth under some family. This had very good success, for it made all hands very industrious, so as much more corn was planted than otherwise would have been by any means the Governor or any other could use, and saved him a great deal of trouble, and gave far better content. The women now went willingly into the field, and took their little ones with them to set corn; which before would allege weakness and inability; whom to have compelled wold have been thought great tyranny and oppression.”
William Bradford, Of Plymouth Plantation 1620-1647, ed. Samuel Eliot Morison (New York : Knopf, 1991), p. 120.
I saw two articles by Ronda Levine recommending “5-Whys” as a preferred root cause analysis technique and giving a 5-Why example. Here’s the example from the article:
After determining that the quality management project would focus on the cause of the image bleed on the t-shirts (because almost 2/3 of the t-shirts produced by her company show this defect), Brenda begins to ask “why” to determine the cause of the problem. At the top of a sheet of paper, she writes “2/3s of t-shirts produced bleed through the material from a severity range of barely noticeable to highly noticeable.”
Underneath this, she writes, “Why?”
“The t-shirt fabric is too thin.” This first response can’t be possible, because the company carefully researched the fabric and the ink for the project to ensure the materials would work. So, she looks for an alternate cause and comes up with:
“The ink isn’t drying fast enough.”
“Why not?” She asks the question, again, to get closer to the root cause of the problem.
“Because the presses are using too much ink.” If this is the answer, it would also solve another problem the company has been experiencing, the blurring of images printed on 1/3 of all shirts produced.
Another potential problem at this stage could be that the ink ordered wasn’t correct for the project. However, Brenda checks the inventory logs and finds that this isn’t the case.
“But why are the presses using too much ink?”
“Because the presses haven’t been properly calibrated.”
It seems as though this last answer is a contender. Brenda sits down with her project team and constructs a plan for changing the calibration on the machines.
Seems like this 5-Why example only has three whys. And is “the presses haven’t been properly calibrated” really a root cause? Seems like a Causal Factor (and just a single Causal Factor at that) in TapRooT®.
It seems like a better approach would have been to draw a generic SnapCharT® for the printing, quality control, and delivery process and then analyze all the Causal Factors rather than just one. But you would have to be TapRooT® Trained to understand how this process would work.
I left a comment at the posting referring people back to some of the previous articles I’ve written. Like this one:
I hate to be so negative, but if somebody doesn’t point out bad advice, many will start using 5-Whys thinking that it is a good idea. Therefore, I’m going to keep pointing out this “BAD ADVICE” until everyone knows the problems with 5-Whys and Cause-and-Effect.