Author Archives: Ken Reed

Equipment Failure: Crane Gearbox Failure

Posted: September 30th, 2016 in Accidents, Equipment/Equifactor®

Equipment Failure - Broken Gear

While performing a lift using a tower crane, a failure of the gearbox cause about 1.000 lbs of hook and rigging gear to fall to the ground, narrowly missing workers in the area. Here is the report.

The investigation revealed several issues, most relating to proper inspections of the gearboxes to identify defective gears.  While again this appears to be a straight equipment failure, we would also want to know:

  • How did the deficient gear end up in the gearbox (it was the wrong material)?
  • Are we looking for repeat failures (this had happened before)?
  • How close were the workers in the vicinity?
  • What was the preventative maintenance plan for this gearbox?  Was it required be the vendor?

Lots of other directions a good investigation will lead you.

 

Equipment Failure? Delta Airlines “Computer” Failure

Posted: September 6th, 2016 in Accidents, Equipment/Equifactor®

equipment failure 2

Last month, Delta Airlines experienced an equipment failure that caused their reservation system to shut down, Media reports indicate close to 2,000 flights were canceled. This is only a few weeks after Southwest Airlines experienced a similar computer failure, causing numerous flight delays and cancellations.

Reports continue to indicate that this was an equipment failure, due to a small fire in a power supply in there server room.  Here is their description:

“Monday morning (August 8) an uninterrupted power source switch experienced a small fire which resulted in a massive failure at Delta’s Technology Command Center. This caused the power control module to malfunction, sending a surge to a transformer outside of Delta, resulting in the loss of power. The power was stabilized and power was restored quickly. But when this happened, critical systems and network equipment didn’t switch over to backups. Around 300 of about 7,000 data center components were discovered to not have been configured appropriately to avail backup power. In addition to restoring Delta’s systems to normal operations, Delta teams this week have been working to ensure reliable redundancies of electrical power as well as network connectivity and applications are in place.”

Keep in mind that the “uninterrupted power supply switch” is actually known as an “uninterruptible” power supply (UPS).  This normally swaps you over to another power source if your primary source fails.  You may have a simple UPS on your computer systems at the office, providing battery backup while power is restored.  In Delta’s case, their UPS system attempted to switch over, but configuration issues prevented a significant number of their devices from actually shifting over.

Additionally, other reports indicate that the reservation system is an extremely antiquated system, linked into other airlines’ (also extremely antiquated) systems.  They have all patched together and upgraded their individuals systems to the point that it is almost impossible to upgrade; it really requires a complete replacement, which would be EXTREMELY difficult and expensive to replace while still being used for current reservations.

So while this is discussed by the airlines as an equipment failure, I think there are more than likely multiple causal factors, of which only one (the initiating problem) was a burned up component.  Without knowing the details, we can see several Causal Factors:

  • A UPS caught fire
  • This small fire caused a large surge and widespread power loss
  • Other equipment was not properly configured to shift to backup power
  • There is no backup in the event of a loss of the primary reservation system
  • The reservation computer system has not been upgraded to modern standards

I always question when a failure is classed as “equipment failure.”  Unless the equipment failure is an allowed event (Tolerable Failure), it is much more likely that humans were much more involved in the failure, with the broken equipment as only a result.

Cheap Root Cause Analysis?

Posted: August 30th, 2016 in Performance Improvement
cheap; root cause analysis

Cheaper is rarely better

I saw this picture today, and I thought about how people often make decisions based entirely on direct cost. For some things, we make a deliberate analysis of the long-range costs, benefits, and applicability of a product. For example, when we buy a new car, we might decide to pass on a particular brand of vehicle based on what we know (or at least, what we have heard) about the quality of that vehicle, its features, its reliability, and the fit to what we need. Remember the Yugo? I don’t see many of them on the road any more!

And yet, when we are making the decision on the best way to improve the health and safety of our coworkers, we seem to jump right on, “What does it cost?” I heard someone in Australia just the other day tell me about an RCA “methodology” that is being evaluated for use in some for their critical improvement programs. I asked why they were considering this other method. Does it work better? The answer made me sad. “Well, no, we don’t think it works as well as TapRooT®. Using [this other system] gives pretty ambiguous results, depending on who is using it. But it’s a little bit cheaper, so my manager wants to go that way.”

This doesn’t seem to make a lot of sense to me. If you can save 10%, but get poor results, what have you actually saved? I encourage you to look at the costs of even a single “simple” incident, and then bounce that against a few percent savings in poor-quality training.  I think you’ll find that the initial savings are lost in the noise.

You wouldn’t buy a car based solely on price. I encourage you to take the same due diligence when you are selecting an RCA program that has the potential to save the lives of your teammates.

WANTED: New Equifactor® Equipment Troubleshooting Tables

Posted: August 29th, 2016 in Equipment/Equifactor®

equipment troubleshooting table

We’re pretty excited about the new TapRooT® VI software service that we released this year. It has some terrific features that are a definite upgrade to the older Version 5 software.

As part of the conversion over to TapRooT® VI, we did an in-depth review of the Equifactor® equipment troubleshooting tables. We found we were able to streamline those tables to make them even easier to use. We dropped some redundant items, standardized some of the terminology, and generally mde them easier to use. Additionally, TapRooT® VI allows you to take the items from the Equifactor® table and drop them right onto your SnapCharT®. It’s a feature we’ve been asked about for quite a while, and the TapRooT® VI architecture finally let us add this enhancement.

I am currently looking for new ideas for tables you would be interested in seeing added to Equifactor®. What general categories of equipment would you like to see developed and added to the system? Some we might be able to do; some aren’t really very conducive to putting into a table format. For example, I was asked to develop tables to troubleshoot PLC problems. While this would be great, there are unfortunately hundreds of different models and types of PLC’s out there, and a simple set of tables would be really tough to do.

Another idea was for hydraulic system troubleshooting. Again, this might be to broad a category. However, I am researching the possibility of doing more specific tables on things like hydraulic cylinders and motors. These might be specific and generic enough that we can put together a useful set of tables.

So what would you like to see? Let me know, and I’ll be happy to take a look.

Medical Errors: Are You Preventing Pressure Ulcers?

Posted: August 26th, 2016 in Medical/Healthcare

Medical Error Prevention

My wife was in a cast a few years ago. After about a day, she noticed it was itchy on the bottom of her foot, near her big toe. We didn’t think anything of it (never in a cast before). When we went in for a checkup after a few days, she told the doctor. They pulled off the cast and found a blistery area on the bottom of her foot. It was caused by a slight pressure from a bump in the cast, which cut off blood flow to that small area on the ball of her foot. It ended up being pretty minor (big blister the size of a half dollar), and it healed up just fine.

I was amazed to find out that this can be fairly common after only a few hours in a stationary position, for example, during surgery. They can turn out to be very painful and potentially disfiguring. DO NOT, under any circumstances, Google for pictures of pressure ulcers!

Here is a guide on how the medical community can help prevent pressure ulcers. It is meant to be a proactive means of looking for opportunities to prevent or detect the circumstances and risk factors associated with perioperative pressure injuries.

Hand Hygiene: Patient Safety Through Infection Control

Posted: August 24th, 2016 in Medical/Healthcare, Performance Improvement

Hand Hygiene_Patient Safety Through Infection Control

I remember my mom telling me to “wash my hands before supper”. Something that we all should know how to do, yet vitally important in the medical community.

How hard can it be to wash your hands? If I told you to “Wash your hands before changing that bandage,” how would you do it? What soap would you use? How do you dry your hands afterwards? At what point in the procedure do I actually have to wash your hands? As you can see, there are lots of opportunity to make a mistake and cause a problem, unless you have the answers to these questions.

Hand Hygiene: A Handbook for Medical Professionals is an about-to-be-released book on how to properly hand infection control in a variety of circumstances.  It puts all of these lessons learned into a single reference for a professional to figure out the right way (and the wrong way) to prevent the spread of infections between patients.

The Joint Commission Summary of Sentinel Events – 2Q 2016

Posted: August 22nd, 2016 in Medical/Healthcare

Clamp

 

Here’s a summary for reported sentinel events for the 2nd quarter of this year, compiled by The Joint Commission. It also compares some of the data against previous years.
It is almost impossible to make accurate comparisons on this data, since all reports are voluntary and, as stated in the report:

Data Limitations: The reporting of most sentinel events to The Joint Commission is voluntary and represents only a small proportion of actual events. Therefore, these data are not an epidemiologic data set and no conclusions should be drawn about the actual relative frequency of events or trends in events over time.

Without knowing who is reporting, who is not reporting, how these numbers are compiled or arrived at, how the problem types are assigned, etc., I’m having a tough time viewing the data in an objective light.

While the data is interesting, I’m not sure how this data is used.  Can anyone give me an example of how the data in this summary might be used?

Severe Pump Damage Due To Inadequate Analysis

Posted: August 22nd, 2016 in Equipment/Equifactor®

Cavitation

Here is a great example of damage to large pumps resulting from a poor understanding of the operating environment. When coupled with inferior manufacturing techniques, rapid failure of critical equipment can occur.

Pump Life Cycle Cost Analysis – Numbers Matter!

Posted: August 15th, 2016 in Equipment/Equifactor®

LCC Graphic

When designing spacecraft, there is a humorous (yet amazingly accurate) list of laws to keep in mind to ensure you are not going down the wrong path when developing spacecraft and their associated systems. Akin’s Laws of Spacecraft Design are a set of well-known nuggets that can be adapted to everyday life. But the one that I want to mention here is Akin’s Law #1:

1. Engineering is done with numbers. Analysis without numbers is only an opinion.

When you are looking at your pumping systems, and trying to decide on the best maintenance or repair strategy for a particular pump failure, you may have several options. For example:

– Should I just replace the failed pump with another identical pump?
– Should I replace it with a more efficient design?
– Is the current pump optimized for the system design?
– What other options are available for this repair?
– Why did it fail in the first place?

Iceberg

Life Cycle Cost analysis can be done after almost any failure to help you decide on the best repair strategy.  This analysis includes things like the costs of the initial purchase, installation and commissioning costs, energy and operation costs, and maintenance costs.  You can perform a relatively accurate cost comparison for various repair / replacement options so that you can make an educated decision on the best course of action.  Pump Life Cycle Costs: A Guide to LCC Analysis for Pumping Systems is the result of a collaboration between the Hydraulic Institute, Europump, and the US Department of Energy’s Office of Industrial Technologies (OIT).  It is definitely worth a few minutes to read through this and get a basic understanding of how to calculate the LCC of a particular installation or repair.

There were a couple of take-aways for me, neither of which was particularly surprising, yet both of which are important to keep in mind:

  1.  Energy consumption is often one of the larger cost elements and may dominate the LCC, especially if pumps are run more than 2000 hours per year
  2.  The cost of unexpected downtime and lost production is a very significant item in the total LCC and can rival the energy costs and replacement parts costs in its impact.

Pie Chart

Which drives home the importance of a good root cause analysis to ensure that your failures (and therefore your downtime) are minimized, as the costs of these failures can rapid skew the entire LCC analysis.  Don’t live with repeat or avoidable failures.

 

Using TapRooT® for Simple Investigations

Posted: August 9th, 2016 in Investigations, Root Cause Analysis Tips, Uncategorized

Investigation
It is almost a no-brainer to perform a complete, extensive root cause analysis for high-risk, high-consequence incidents. There are many reasons for this:

– Required by regulators or law
– Required by company policy
– Perceived higher return on investment

However, companies often default to less developed (and therefore less accurate) analyses for lower risk, lower consequence problems. For example, almost everyone will perform a TapRooT® investigation when there is a serious injury; this is a high-consequence incident, and preventing it in the future is perceived to have the highest ROI.  But what about a near miss?  Or maybe someone tripped over an air line on the floor, dropping a repair part and damaging it?  Most companies will either not perform any investigation, or they will default to “easy” methods (5-Why’s, etc.).  Why spend any time on these “simple” incidents?  Let’s just do a quick “analysis” and move on?

While I completely understand this thought process, there are some serious flaws in this thinking.

  1.  Low ROI.  While a particular incident may not have caused a large loss, this dos not mean it automatically deserves no attention.  Maybe tripping over the air line only caused $800 in damage this time.  But what about the other issues that have been caused by poor housekeeping in the past?  What if the person had tripped and fallen over the edge of a platform?  Making a quick assumption like this can allow you to miss potentially serious issues when taken together.  Performing a poor analysis will lead to repeats of the problem.
  2. Poor results of “quick” RCA methods.  Keep in mind that a quick method probably means that you did not gather any information.  You are therefore performing an “analysis” without any data to analyze.  If your analysis method takes 5 minutes, you have probably just wasted 5 minutes of your time.  If you’re going to perform an RCA, make sure it gets to useable and consistent answers.
  3. TapRooT® is only for the big stuff.  This thought often frustrates me.  It is true that you will not perform a TapRooT® investigation in 5 minutes.  However, any method that purports to give you magic answers in a few minutes is not being honest.  See #2 above.  However, that does NOT mean that TapRooT® must take days of your time.  For simple investigations, the results of a TapRooT® investigation may be found in just an hour or so.

So, how do we use TapRooT® for lower risk or low consequence problems?  This year, we have modified the TapRooT® methodology to allow you to use the steps of the process that you need to perform a great investigation on simple problems.  This updated process isn’t really new; it just codifies how we’ve taught you to use TapRooT® in the past for these simpler problems.  We make the process more efficient and give you the opportunity to optionally skip some of the steps.

Here is the new process flow for low to medium risk incidents:

Flowchart with no paragraphs
 

There are some important points that I wanted to highlight about this new process flow:

  1. You always start with a SnapCharT®.  There is no way to perform any type of analysis unless you first gather some information.  Again, any other process that advocates performing an analysis on the information you received in a quick phone call is not a real analysis.  The SnapCharT® ensures you have the right information to actually look for root causes.
  2. There is an off-ramp right at the beginning.  Once you’ve gathered information in a SnapCharT®, you can then make an intelligent decision as to whether this problem has the potential to uncover significant problems.  You may find, after building your SnapCharT®, that this really was an extremely low potential problem, with minimal consequences.  You will then stop the analysis at that point, put simple corrective actions in place to fix what you found, and then document the problem for later trends.  That’s it.  While most investigations will continue on with the rest of the process, there are some issues that do not require any further analysis and don’t deserve any further resources.
  3. For most investigations, you will continue by identifying Causal Factors, and run those Causal Factors through the Root Cause Tree®.  No different than before.
  4. For these simpler problems, it probably is not worth the effort of looking for generic causes.  We have made this step optional.  It you feel the problem has the potential to be more widespread, you can continue to look for generic issues, otherwise, go straight to corrective actions.
  5. Low to medium risk incidents probably do not need the resources you would normally expend writing full SMARTER corrective actions.  We encourage you to write corrective actions based on the guidance in the Corrective Action Helper®, but writing fully SMARTER fixes is probably not necessary.

For more serious incidents, we would still use the full 7-Step TapRooT® Process that you are familiar with.  However, for lower risk or lower consequence problems, this abbreviated process flow is much easier to use, allowing you to more quickly work through a TapRooT® investigation.  Why use 5-Why’s and get poor results (as expected) just to “save time,” when you can use the simplified TapRooT® process to get MUCH better answers with less effort than before?

The 2-Day TapRooT® Root Cause Analysis Course not covers this simpler method of performing TapRooT® investigations.  Attendees will still be able to perform investigations on any incident, but we stress this more efficient process flow.

Choose a course and register here!

Tips for Maintaining your Air Compressor

Posted: August 9th, 2016 in Equipment/Equifactor®

Compressor Maintenance

A new air compressor can be a significant investment at your facility. While most people assume that they are performing adequate maintenance on their equipment, I am often surprised by how many companies are not performing or tracking even the most basic maintenance.
Here are some fairly simple yet important tips on maintaining your air compressors, courtesy of Ingersoll Rand. Bounce these tips against your preventative maintenance plan and see if you’re fully covered.

Bearing Failures: Keep Them Clean!

Posted: August 1st, 2016 in Equipment/Equifactor®

Contaminated Bearing

According to the chart below, almost half of all pump bearing failures are due to lubricant contamination.  In the chart, you can probably add the “Corrosion” cause to this, since bearing corrosion is most likely due to a poorly sealed bearing.

Failure Chart

Credit: SKF

Heinz Bloch has written a great article on the importance of keeping up with bearing seal technology. He notes that only 10% of rolling-element bearings ever reach their expected end of life. While we seem to put a lot of effort into ensuring we have the right bearings with the proper lubrication, we then do a poor job of maintain those bearings. Imagine if your bearings actually lasted until the calculated end of life!

Heinz Bloch will be leading 2 sessions at our Global TapRooT® Summit in San Antonio this week. I always look forward to his talks!!

Tappan Zee Crane Collapse: What We Know

Posted: July 25th, 2016 in Accidents, Equipment/Equifactor®

Crane boom collapse

Last week’s collapse of the 235 foot boom on a crane building the new Tappan Zee bridge is still under investigation. There are apparently 3 separate investigations in progress, and as expected, not much information has been released.

The boom came down across all lanes of traffic on the old (still active) portion of the bridge. Amazingly enough, there were only 4 minor injuries, and it cause direct damage to a single vehicle. If you’ve ever driven across that bridge (I was on it just 30 days before the incident), you understand how lucky we were not to have any fatalities.

What we know so far:

– There was almost no wind, and this has been eliminated as a cause.
– The crane was being used to drive piles into the river bottom using a 60 ton vibratory hammer.
– There is a “black box” on the crane which will supply data on the boom angle, weight, etc.
– The operator says he knows what caused it (it wasn’t him).
– This is a new model crane with several safety features designed to eliminate human error.
– This is the only crane of this model being used on the project.
– The crane operator is licensed, with over 30 years of experience.

Tappan Zee Before

This seems to be a good start to an investigation. And as expected, there are a lot of questions (and “expert” opinions) about what happened.  Some of the questions that might be asked:

  • Was the crane properly inspected and certified?
  • What was the condition of the vibratory hammer?
  • Was there any sense of urgency that may have caused someone to make a mistake?  The contract specified $120,000 per day fine of the project finished late.
  • Was there an adequate review and approval of the safe zone around the crane operation?

It’s important not to just ask the hard questions, but also to give the hard answers.  For example, one option that could have been in place (20/20 hindsight) would be to close the operating section of the bridge during construction.  While this would definitely have been 100% safer, does it actually make sense to do this?  Were there adequate safeguards in place to allow continued use of the old span?  The answers here might be yes, and it was perfectly appropriate to operate the old bridge during contruction.  I’ve seen hundreds of construction projects that have cranes in near proximity to the public.  In fact, almost every downtown construction project has the potential to cause injury to the public if a crane collapses.  Some of the criticism I’ve seen written about this accident (“Why wasn’t the old span closed during this constructiuon project?”) is too simplistic for the real world.  The real question should be, “Were there adequate safeguards put in place for the level of risk imposed by this projct?”  We don’t know the answers yet, but just asking these questions in an unbiased investigation can provide useful information.

Crane Collapse

It appears that there is plenty of information available to the investigators. I’m very interested to see the results after the investigations are complete.

Water Hammer – What is it, and how we can prevent equipment damage?

Posted: July 20th, 2016 in Equipment/Equifactor®

water_hammer

If you’ve ever heard your pipes rattle in your house after flushing the toilet, you’ve experienced water hammer. While this is just a noisy occurrence in your home, it can cause major damage in industrial situations.
We talk about water hammer during our 5-Day TapRooT® course as a great root cause analysis example. It’s a fairly easy concept on the surface, but it’s actually a fascinating phenomenon. I found this great article that discusses the causes of water hammer and describes some ideas to keep in mind that can prevent or at least mitigate the consequences.

Equipment Failure? No, the Sloth Did It!!

Posted: July 6th, 2016 in Equipment/Equifactor®, Jokes

Worker: But boss, I swear I didn’t shut that valve!

Boss:   Well, who do you think shut it? Aliens? Gremlins?

Apparently, it was just a sloth!

https://www.youtube.com/watch?v=Unby4b8zhpA

Gear Coupling Troubleshooting and Reliability

Posted: June 28th, 2016 in Equipment/Equifactor®

Coupling

Gear couplings have been around for a long time. And yet there are still frequent equipment failures due to improper selection, use, and maintenance of couplings.
Keep in mind that a coupling problem can manifest itself in subtle ways. A broken coupling is pretty obvious. However, you could see symptoms such as:
– Increased vibration readings in the equipment
– Overheating of shaft bearings
– Unusual resonances in your vibration data
– Overload and overheating of motors

The Equifactor® module of the TapRooT® VI software service has some great troubleshooting tables, one of which is focused on gear couplings. Once you determine that you have a coupling issue, you can look up the symptoms you are seeing and determine what could be causing that symptom.

Coupling

I also found a nice article describing problems you might have with a coupling, and how to maintain the reliable operation of a gear coupling. Take a look and let me know what you think.

Have a Plan! Using the TapRooT® Tools to Plan Your Investigation

Posted: June 22nd, 2016 in Investigations, Root Cause Analysis Tips, TapRooT

hand-895592_1280

Sometimes, it seems like the toughest part of an investigation is figuring out how to get started. What’s the first step? Where am I headed? Who do I need to talk to? What questions should I ask?

Unfortunately, most systems kind of leave you hanging.  They assume that you’re some kind of forensic and investigation expert, with years of psychological and interviewing training already under your belt.  Like you’re only job at your company is to sit around and wait for a problem to occur so that you can perform an investigation!

Luckily, TapRooT® has some great tools that are designed to walk you through an investigation process.  We have recently tweaked this guidance to make it even easier to quickly progress through the investigation.  Some of the tools are used for every investigation; some are used only in specialized circumstances when you need additional help gathering information.

Some of these tools are required for every investigation; some are optional data-gathering tools.  Let’s first take a look at the required tools.

Mandatory Tools

SnapCharT®:

One of the first things you need to do is get a good understanding of exactly what happened.  Instead of just grabbing a big yellow legal pad and start scribbling down random thoughts, you will use the SnapCharT® to build a visual representation and timeline of what actually occurred.  By putting your thoughts down on the timeline, you can more easily see not only what you already know, but also what you still need to find out.  It helps you figure out what questions to ask and who to ask.  Building your SnapCharT® is ALWAYS the first step in your investigation for just this reason.  There is no reason to go into the interview process if you don’t already have a basic understanding of what happened and what questions you need to ask.  It’s really amazing to see a group of people start building a SnapCharT®, thinking they already have a good understanding of the issues, and watch them suddenly realize that they still need to ask a few pointed questions to truly understand the problem.

Root Cause Tree®:

Most TapRooT® users know that the Root Cause Tree® is used during the root cause analysis steps in the process.  However, this tool is a treasure trove of terrific questions and guidance that can be used while building your SnapCharT®.  In conjunction with the Dictionary®, it contains a comprehensive list of interview questions; the same questions that a human performance expert would ask if they were performing this same investigation.  You’ll need the answers to these questions once you get to the root cause analysis phase.  Why not “cheat” a little bit and ask these questions right up front while building your SnapCharT®?

The tools I listed above are used during EVERY investigation.  However, in certain circumstances, you may need some additional guidance and data-gathering tools to help build your SnapCharT®.  Let’s look at the non-required tools.

Optional Tools

Change Analysis:  This is a great tool to use to help you ask thought-provoking questions.  It is used when either something is different than it used to be, or when there is a difference between two seemingly identical circumstances.  The Change Analysis tool helps you determine what would have normally made the situation operate correctly, and (this time) what allowed the problem to show up under the exact circumstances of the incident.  It is actually an extremely easy tool to use, and yet it is very powerful.  I find this to be my most-used optional tool.  The results of this analysis are now added to your SnapCharT® for later root cause analysis.

Critical Human Action Profile (CHAP):  Sometimes, you need help understanding those “dumb” mistakes.  How can someone be walking down the stairs and just plain fall down?  The person must just be clumsy!  This is a great time to use CHAP.  It allows you to do an in-depth job task analysis, understanding exactly what the person was doing at each step in the task.  What tools were they using (and supposed to be using)?  How did we expect them to perform the individual steps in the task?  This tool forces you to drill down to a very detailed analysis of exactly what the person was doing, and also should have been doing.  The differences you find will be added to your SnapCharT® to help you understand EXACTLY what was going on.

Equifactor®:  If your investigation includes equipment failures, you may need some help understanding the exact cause of the failure.  You can’t really progress through the root cause analysis unless you understand the physical cause of the equipment problem.  For example, if a compressor has excessive vibration, and this was directly related to your incident, you really need to know exactly why the vibration was occurring.  Just putting “Compressor begins vibrating” on your SnapCharT® is not very useful; you have to know what lead to the vibration.  The Equifactor® equipment troubleshooting tables can give your maintenance and reliability folks some expert advice on where to start looking for the cause of the failure.  These tables were developed by Heinz Bloch, so you now have the benefit of some of his expertise as you troubleshoot the failure.  Once you find the problem (maybe the flexible coupling has seized), you can add this to your SnapCharT® and look at the human performance issues that were likely present in this failure.

The TapRooT® System is more than just the Root Cause Tree® that everyone is familiar with.  The additional tools provided by the system can give you the guidance you need to get started and progress through your investigations.  If you need some help getting started, the TapRooT® tools will get you going!  Learn more in our 2-day TapRooT® Incident Investigation and Root Cause Analysis Course.

Which Pump is Best? Evaluating Pump Curves

Posted: June 13th, 2016 in Equipment/Equifactor®

PumpCurve

There are quite a few variables that must be taken into account when selecting the correct pump for a particular application. For centrifugal pumps, the pump curves for a specific pump contain a lot of data. Here are some ideas to help you decide which pump would be best for a particular application, based on the pump curve for various pumps.

Equipment Maintenance and Troubleshooting – Calculating pump run time and duty cycle

Posted: June 9th, 2016 in Equipment/Equifactor®

image

When discussing pump maintenance, we often forget about the electrical side of the equation. Mechanics think about the mechanical side, and we’ll let the electricians worry about the power side. However, it is critical that we take a more holistic view of the entire pump system to make sure we’re not exceeding manufacturer specs when we are using our equipment.

There are several measures we need to keep in mind when we look at equipment lifetime calculations. For example:
– # start/stop cycles in a given period
– Run time after starting
– Overall duty cycle

I read this interesting article about why these items are important.  The author also had a calculator spreadsheet that helps you figure out appropriate run times for pumping out a sump or tank.  That calculator is here.

Electrical Equipment Troubleshooting – Don’t Be Scared!

Posted: May 31st, 2016 in Equipment/Equifactor®

schematic

I found this article about troubleshooting electrical failures in heavy equipment. It discussed some pretty concise nuggets of info I thought were pretty interesting. In many cases, troubleshooting an electrical fault is more a case of figuring out “What is working?” as opposed to “What is broke?”.

Equifactor® Equipment Troubleshooting Basics

Posted: May 25th, 2016 in Equipment/Equifactor®, Root Cause Analysis Tips

afterburner-inspection-897513_1280

Equifactor® is designed to be used to help your equipment maintenance and reliability people figure out the root causes of mechanical or electrical equipment failures.

I thought I’d take the opportunity to take us back to the basics for a moment. I’d like to describe how the Equifactor® Equipment Troubleshooting module of TapRooT® is designed to be used.

What is Equifactor®?

When performing a root cause analysis using TapRooT®, it is critical that you gather the right information for the problem at hand.  This can be safety information, environmental procedures, policies and work instructions for a particular task, etc.  It is usually pretty obvious what types of data you need for the type of investigation you’re performing.

Sometimes, additional TapRooT® data-gathering tools are required for specific types of problems.  Equifactor® is one of those tools.  It is designed to be used to help your equipment maintenance and reliability people figure out the root causes of mechanical or electrical equipment failures.

Why use Equifactor®?

During your investigation, you may find that one of your problems relates to an equipment malfunction.  For example, you might find that a compressor is vibrating above expectation.  You can put this fact into your SnapCharT®, but now what?  What do you do with this piece of information?  To get past this point in the SnapCharT®, you really need the answer from your troubleshooting team:  “Why is the compressor vibrating?”  Unfortunately, if you knew that, you wouldn’t need to put the question on your SnapCharT® in the first place!  You need to know the physical cause of the vibration in order to progress to a more detailed SnapCharT® with Causal Factors.

Equifactor® in detail

This is where Equifactor® comes in.  To help your equipment experts figure out the physical cause of the vibration, they will probably rely on their experience and local manuals for troubleshooting advice.  They’ll look at the possible causes they are familiar with, and hopefully find the problem.  However, we can’t rely on hope.  What happens when they check the items they are familiar with, and the problem is not found?  This is when they can turn to the Equifactor® troubleshooting tables for help.  The tables give a comprehensive list of possible causes of compressor vibration.  Your experts can review these tables to identify all the possible causes that apply to your compressor, and then use that list of possible causes to devise a detailed troubleshooting plan to identify the issue.  Theses tables give your maintenance team some great guidance on things to look at during their troubleshooting.  These items are quite often things that they have never seen before, and therefore did not think to look for.

Equifactor® – a TapRooT® Tool

Once your team finds the physical cause of the compressor vibration (for example, maybe the wrong coupling bolts were used, throwing off the balance of the machine), we’re not done.  Equifactor® is NOT a separate, independent tool.  It is designed to be used as a data-gathering tool for your TapRooT® investigation.  Therefore, the problem that was found (wrong coupling bolts) is now added to the original SnapCharT®, and we can now move forward with our normal TapRooT® investigation.  I’m pretty sure the bolts didn’t magically install themselves; a human was involved.  We can now discover the human performance issues that lead the mechanics to use the wrong bolts.  We continue adding information to our SnapCharT®, until we can run all of the Causal Factors (one of which will probably be, “Mechanics assembled the coupling using the wrong bolts”) through the Root Cause Tree®.  We can now apply effective corrective actions to the problem.  Instead of blaming the mechanic (“Counselled the mechanic on the importance of using the authorized repair parts during coupling assembly”), we can now target our corrective actions at the reason the mechanic used the wrong bolts (correct bolts not available, common use of “parts bins” to repair equipment, wrong part number on repair order, etc.).

Equifactor® is a terrific tool to assist your maintenance and reliability folks in finding the physical cause of a machinery problem.  It is a tool to assist you in performing your TapRooT® investigation when an equipment problem is part of that investigation.  Learn to use these tables to save you time and effort when troubleshooting your equipment issues.

LEARN MORE about Equifactor®.

CONTACT US about a course.

Common Wind Turbine Equipment Failure Modes

Posted: May 19th, 2016 in Equipment/Equifactor®

Turbine failure

I found this article discussing some common failure modes for wind turbines. While not completely new, it does give you some things to consider when performing maintenance on turbine equipment.

Heavy Equipment Maintenance Tips

Posted: May 13th, 2016 in Equipment/Equifactor®

3D-Repair-Men

I saw this entry today, highlighting some great ideas on maintaining your heavy equipment. I think what caught my eye was the very first tip: “Stay on top of large machinery operator training.” Any plan to keep your equipment operating at top performance must include the operators and maintenance personnel. It doesn’t matter if you have the very best maintenance plans and schedules if the operators don’t understand how to properly start, operate, and secure the equipment. And maintenance techs must also be properly trained; otherwise, the best preventative maintenance plan will be poorly implemented.
Training of your staff should ALWAYS be a top priority!

Medical Errors – 3rd Leading Cause of Death in the US

Posted: May 4th, 2016 in Accidents, Current Events, Medical/Healthcare

Medical Death Chart

Wow. Quite an eye-opening Washington Post article describing a report published in the BMJ. A comprehensive study by researchers at the John Hopkins University have found that medical mistakes are now responsible for more deaths in the US each year than Accidents, Respiratory Disease, and Strokes. They estimate over a quarter million people die each year in the US due to mistakes made during medical procedures. And this does NOT include other sentinel events that do not result in death.  Researchers include in this category “everything from bad doctors to more systemic issues such as communication breakdowns when patients are handed off from one department to another.”  Other tidbits from this study:

  • Over 700 deaths each day are due to medical errors
  • This is nearly 10% of all deaths in the US each year

What’s particularly alarming is that a study conducted in 1999 showed similar results.  That study called medical errors “an epidemic.”  And yet, very little has changed since that report was issued.  While a few categories have gotten better (hospital-acquired infections, for example), there has been almost no change in the overall numbers.

I’m sure there are many “causes” for these issues.  This report focused on the reporting systems in the US (and many other countries) that make it almost impossible to identify medical error cases.  And many other problems are endemic to the entire medical system:

  • Insurance liabilities
  • Inadequate reporting requirements
  • Poor training at many levels
  • Ineffective accountability systems
  • between patient care and running a business

However, individual health care facilities have the most control over their own outcomes.  They truly believe in providing the very best medical care to their patients.  They don’t necessarily need to wait for national regulations to force change.  They often just need a way to recognize the issues, minimize the local blame culture, identify problems, recognize systemic issues at their facilities, and apply effective corrective actions to those issues.

I have found that one of the major hurdles to correcting these issues is a lack of proper sentinel event analysis.  Hospitals are staffed with extremely smart people, but they just don’t have the training or expertise to perform comprehensive root cause analysis and incident investigation.  Many feel that, because they have smart people, they can perform these analyses without further training.  Unfortunately, incident investigation is a skill, just like other skills learned by doctors, nurses, and patient quality staff, and this skill requires specialized training and methodology.  When a facility is presented with this training (yes, I’m talking about TapRooT®!), I’ve found that they embrace the training and perform excellent investigations.  Hospital staff just need this bit of training to move to the next level of finding scientifically-derived root causes and applying effective corrective actions, all without playing the blame game.  It is gratifying to see doctors and nurses working together to correct these issues on their own, without needing some expensive guru to come in and do it for them.

Hospitals have the means to start fixing these issues.  I’m hoping the smart people at these facilities take this to heart and begin putting processes in place to make a positive difference in their patient outcomes.

 

Rail Accidents: It’s the Entire System that Matters

Posted: May 2nd, 2016 in Accidents, Equipment/Equifactor®, Investigations

amtrak

 

On April 3rd, an Amtrak passenger train collided with a backhoe that was being used by railroad employees for maintenance.  Two maintenance workers were killed, and about 20 passengers on the train were injured.  For those that are not familiar with the railroad industry, I wanted to discuss a system that was in place that was designed to help prevent these types of incidents.

Many trains are being back-fitted with equipment and software that is collectively known as positive train control (PTC).  These systems include sensors, software, and procedures that are designed to help the engineer safely operate the train.  It is designed to allow for:

  • Train separation and collision avoidance
  • Speed enforcement
  • Rail worker safety

For example, as the train approaches a curve that has a lower speed limit, a train with PTC would first alert the engineer that he must reduce speed, and then, if this doesn’t happen, automatically reduce the speed or stop the train as necessary to prevent exceeding tolerance.  Another example is that, if maintenance is known to be occurring on a particular section of track, the train “knows” it is not allowed to be on that particular section, and will slow / stop to avoid entering the restricted area.  The system can be pretty sophisticated, but this is the general idea.

Notice that I described the system as a series of sensors, software, and procedures that make up PTC.  While we can put all kinds of sensors and software in place, there are still procedures that people must follow for the system to operate properly.  For example, in in order to know about worker safety restrictions on a particular piece of track, there are several things that must happen:

  • The workers must tell the dispatcher they are on a specific section of track (there are very detailed procedures that cover this).
  • The dispatcher must correctly tell the system that the workers are present.
  • The software must correctly identify the section of track.
  • The communications hardware must properly communicate with the train.
  • The train must know where it is and where it is going.
  • The workers must be on the correct section of track.
  • The workers must be doing the correct maintenance (for example, not also working on an additional siding).
  • If being used, local temporary warning systems being used by the workers must be operating properly.  For example, there are devices that can be worn on the workers’ bodies that signal the train, and that receive a signal from the train.
  • Proper maintenance must be performed on all of the PTC hardware and software.

As you can see, just putting a great PTC system in place involves more than just installing a bunch of equipment.  Workers must understand the equipment, its interrelation with the train and dispatcher, how the system is properly initialized and secured, the limitations of the PTC system, etc.  People are still involved.

For the Washington Amtrak crash, we know that there was a PTC system in place.  However, I don’t know how it was being employed, if it was working properly, were all the procedures being followed, etc.  I am definitely not trying to apportion any blame, since I’m not involved in the investigation.  However, I did want to point out that, while implementation of PTC systems is long overdue, it is important to realize that these systems have many weak points that must be recognized and understood in order to have them operating properly.

Humans will almost always end up being the weak link, and it is critical that the entire system, including the human interactions with the system, be fully accounted for when designing and operating the system.  Proper audits will often catch these weak barriers, and proper investigations can help identify the human performance issues that are almost certainly in play when an accident occurs.  By finding the human performance issues, we can target more effective corrective actions than just blaming the individual.  Our investigations and audits have to take the entire system into account when looking for improvements.

Connect with Us

Filter News

Search News

Authors

Angie ComerAngie Comer

Software

Anne RobertsAnne Roberts

Marketing

Barb CarrBarb Carr

Editorial Director

Chris ValleeChris Vallee

Human Factors

Dan VerlindeDan Verlinde

VP, Software

Dave JanneyDave Janney

Safety & Quality

Garrett BoydGarrett Boyd

Technical Support

Ken ReedKen Reed

VP, Equifactor®

Linda UngerLinda Unger

Co-Founder

Mark ParadiesMark Paradies

Creator of TapRooT®

Michelle WishounMichelle Wishoun

Licensing Paralegal

Per OhstromPer Ohstrom

VP, Sales

Shaun BakerShaun Baker

Technical Support

Steve RaycraftSteve Raycraft

Technical Support

Wayne BrownWayne Brown

Technical Support

Success Stories

Investigation Detects Lack of Experience in Experienced Personnel And Leads To Job Simulation To Improve Performance Submitted by: Errol De Freitas Rojas, SHE Coordinator Company: ExxonMobil, Caracus, Venezuela Challenge We investigated a Marine incident where an anchor cable picked up tension during maneuvers and caused a job to be stopped. We needed to find the …

The healthcare industry has recognized that improved root cause analysis of quality incidents…

Good Samaritan Hospital
Contact Us