Category: Equipment/Equifactor®

Damage to Motiva’s New Crude Unit Seems Like an Excellent Opportunity for Advanced Root Cause Analysis

July 17th, 2012 by

Corrosion because a valved leaked caustic into a relatively new crude unit that was off line for some quick repairs will cause the unit to be down for perhaps a year. Here’s the story:

Human error? Equipment failure? Bad operating or maintenance practices? Unexpected corrosion? Non of these are root causes. To find the root causes you need a systematic process like Equifactor® and TapRooT® to troubleshoot equipment problems and dig down to the real, fixable root causes of the problems. 

For more information about TapRooT®, see:

Time for Fireworks Root Cause Analysis?

July 6th, 2012 by

The Wall Street Journal reported that the was a fireworks malfunction in San Diego on the 4th. It seems all the fireworks went up at once. Here’s a video of what it looked like:

The WSJ says that Garden State Fireworks Inc. issued a statement blaming the mishap on a technical malfunction.

Time for fireworks root cause analysis?

One more idea … was the theme “Fast & Furious Six” or “Shock and Awe”?

Here’s an interview with the owner of Garden State Fireworks (the company that did the display):

Good news … no one was injured.

Might have been the biggest finally ever! (And the only one that was at the start of the show!)

UK Rail Accident Investigation Branch Reports on a Rail Accident Equipment Failure

July 3rd, 2012 by

Here’s the Summary from the UK RAIB:

Detachment of a Cardan Shaft at Durham Station – 10 April 2011


On 10 April 2011, at around 12:30 hrs, a cardan shaft fell from an empty class 142 passenger train travelling through Durham station at 75 mph (120 km/h). The train ran for a distance of approximately 2 miles (3.2 km) before being stopped. A member of the public standing on a platform suffered a minor injury from ballast thrown up as the cardan shaft fell onto the track; the train suffered damage, including loss of diesel fuel.

The immediate cause of the detachment was the complete fracture of a final drive input shaft. The input shaft fractured because a seized input bearing generated a large amount of frictional heat between the shaft and bearing. The input shaft was locally heated to a temperature at which its strength was reduced so that it could no longer carry its normal loading.

The RAIB established that the seizure of the bearing was due to the setup of the bearings during overhaul which resulted in a lack of end float in the bearings when in operation. The final drive failure was not detected by the checks which were in place to identify the onset of such failures. The detached cardan shaft was not retained by its safety loops.

The RAIB has made six recommendations to Northern Rail and owners of class 14x vehicles. Two recommendations relate to reviewing the end float and alignment requirements for the class 14x final drives and ensuring that any changes to the setup of safety critical components are validated. One recommendation covers the detection of impending final drive failures. The fourth recommendation relates to the final drive post-overhaul testing and the fifth covers the provision of key design information to overhaul and maintenance contractors. The final recommendation relates to the completion of the review work associated with the events in the immediate aftermath of the accident.

For the complete report, see:

Monday Accident and Lessons Learned: When High Reliability Systems Fail

June 25th, 2012 by

What if you had a system with two regular power supplies, two back-up power supplies (diesels), and a battery back up with a separate diesel to keep it charged?

Wow!  This should be highly reliable right?

Read about how this system failed here:


Now here’s the question …

What did they miss in their “root cause analysis”?

I think they had great troubleshooting.

They even had actions to address generic problems.

But I don’t think they found the root causes of the “cloud failure” incident.

What do you think? Leave your comments here…

Equipment Troubleshooting: Taming Those Misbehaving Motors

May 18th, 2012 by

The points in the May 2012 edition of Maintenance Technology’s article, “Taming Those Misbehaving Motors,” (Thomas H. Bishop, P.E.) can help the most seasoned maintenance pros become even better equipment troubleshooters.

“Given the number of motors in most plants, it’s not surprising that they sometimes misbehave.  While maintenance professionals are typically well equipped to tame the unruly motors that come their way, they’re occasionally puzzled by the following three behaviors” (Learn 3 behaviors and troubleshooting tips).

Need to find the root causes of equipment and machinery problems at your plant? Learn more about our Equifactor® Equipment Troubleshooting & Root Cause Failure Analysis course  (view info).


Root Cause Analysis Tip: Best Practice Sharing #2 – TapRooT® Summit

April 25th, 2012 by

In today’s Root Cause Analysis Tip, Phil Goodman shares his TapRooT® best practice at our 2012 Global TapRooT® Summit.

Today is Part 2 of 12. Click here for Part 1.

Next week, hear Jeff Cooper of Boart Longyear share his TapRooT® best practice.

Time for Equifactor®? Maybe Past Time!

January 25th, 2012 by


Here the text that came with the picture ,,, don’t know if it is true …

Here are some photos of what happens when bearings overheat
in the transmissions of these monster windmills.

To date no gear oil  has been invented to withstand the pressures produced within these transmissions.

Most recently, the government gave Dow-Corning a big  grant to work on it.

Previously, many others had tried and failed.

As they age there will be many  more bearing failures.



Hard to believe that every wind turbine will fail due to inadequate gear lubrication.

I had heard that many wind turbines are not getting proper maintenance.

Wonder what Equifactor® has to say about this?

Investigation of Fatal Elevator Accident in New York Continues – Maintenance Work May Be the "Cause"

January 24th, 2012 by

The New York Times reported that Robert LiMandri, the Commissioner of the Buildings Department in New York City, said:

We know that there was work being done right before the unfortunate event, and we do believe that is a contributing cause, or the cause.

He also said:

We know for sure that those events directly before this unfortunate accident clearly are part of our investigation.

Suzanne Hart was killed while when the elevator suddenly shot upwards as she boarded.

The story also says that the about 60,000 elevators in New York produced 53 accident in the previous year.

Great Human Factors: Wrong Tools, Bad Access by Design, Per “Ingenuity” or All of the Above?

January 19th, 2012 by

As an ex-aircraft mechanic and a “sometimes gotta work on my own car” mechanic, I have in the past borrowed or made some of the tools pictured below. The questions remain:

Wrong Tool?

Bad Access by Design?

Mechanic’s Ingenuity?

Or a little bit of them all?

Finally, ever have one of your modified tools bite you back?  Share your stories in the comment section.



Oil Cooler Line Wrench #2 009 (Medium)

Drinking Water Emergency at Point Hope Caused by Pump Impeller Problems

December 27th, 2011 by

How can bad equipment reliability cause a crisis? Imagine losing the water supply at your house or business for an extended period.

It seems that all five impellers on their five pumps failed due to corrosion on pumps at the Point Hope, CA, water plant.

The previous impellers lasted lasted 67 years without failure but the new pumps at a new plant commissioned in 2005 only made it until 2011. The first impeller inspection wasn’t even scheduled until 20012.

For complete details, see these stories:–upkeep-issues-ruled-out-as-port-hope-water-emergency-cause

And if you want to learn more about troubleshooting pump problems, attend the TapRooT®/Equifactor® Equipment Troubleshooting and Root Cause Analysis Course. CLICK HERE to see the public course schedule for 2012.

Monday Accident & Lessons Learned: Make Sure You Remove the Grounding Strap Before You Energize the Switchgear!

November 21st, 2011 by

Pictures sent to me by a TapRooT® User of an unfortunate accident …

Screen Shot 2011-11-04 At 6.24.47 Pm

Screen Shot 2011-11-04 At 6.25.46 Pm

Screen Shot 2011-11-04 At 6.26.19 Pm

Screen Shot 2011-11-04 At 6.27.00 Pm

Screen Shot 2011-11-04 At 6.27.35 Pm

Screen Shot 2011-11-04 At 6.28.14 Pm

Screen Shot 2011-11-04 At 6.28.58 Pm

Monday Accident & Lessons Learned: Bad Maintenance Practices Lead to Failed Train Wheel Set and Derailment

October 31st, 2011 by

Do your maintenance folks “make it work”?

Screen Shot 2011-10-06 At 3.20.41 Pm

Looks like “just make it work” was a cause of this accident.

See the accident report from the UK Rail Accident Investigation Branch:

Blackberry Outage – Is a Three Day Outage on a High Reliability Business Application OK?

October 13th, 2011 by

Many people count on their Blackberries to run their business. They get concerned about even a one hour outage. But the most recent outage has been going on for three days.

Here’s a quote from a recent Forbes story about the unexpected outage:

In a Wednesday afternoon conference call for reporters, RIM’s Chief Technology Officer for software, David Yach, said the company is working “around the clock” to fix the service issues. Though RIM says it is still investigating the root cause of the problem, Yach expressed certainty that the global outage stemmed from the failure of a single “core switch” in Europe and was not the result of a network breach or hack. Since RIM provides back-end service support for all BlackBerrys, the company operates multiple nodes and switches around the world for routing data.

This failure caused a backlog that overwhelmed the system.

Does this sound like reliability issues you face?

Could they have avoided this issues with some proactive application of root cause analysis?

We’ll watch what comes out in future press reports.

Lightning NOT the Root Cause of Amazon Data Center Outage

August 17th, 2011 by

The Inquirer published this article:

Lightning did not cause Amazon datacentre outage

Interesting to see the root cause analysis of a computer reliability problem being discussed.

First, we could argue if “lightning” could be a root cause. But let’s save that argument for some other time.

But what I found interesting in this article was that they were eliminating a potential cause and then going on to look further.

Looks like it is a power supply reliability root cause analysis. The first step in this process is evidence collection and troubleshooting of the “cause” of the failure.

Since they don’t know the reason that the transformer exploded, finding a root cause is going to be difficult.

It would be interesting to see the process used in this engineering analysis that is in the start of the evidence collection and evaluation process that contributes to the root cause analysis.

Next, the article goes on to discuss problems with the load transferring to backup diesel generators. This would be a second causal factor that needs to be analyzed (troubleshooting and root cause analysis).

The approach for corrective action was mentioned in the article:

– more redundancy and more isolation to its PLCs, in order to prevent failures from spreading,
– a new “environmentally friendly” backup PLC
– improved load balancing
– drastically shorter recovery times

All this will be accomplished “… as soon as possible.”

Of course these corrective actions aren’t very specific (they would not meet the SMARTER criteria in TapRooT®) but they are just a list out of an article. Perhaps the company corrective actions are more detailed.

Also, it is interesting to see additional safeguards being suggested before the failure of the current safeguards are understood.

For cloud computer users, let’s hope a successful root cause analysis with effective corrective action is completed so that future outages can be minimized.

37 Bodies Recovered from Two Mine Accidents in the Ukraine

August 1st, 2011 by

The Associated Press reported that the bodies of all the miners killed in two mine accidents in the Ukraine had been recovered.

One accident was related to an explosion of methane gas (26 killed) and the other was related to the failure of an elevator (11 killed).

Press Release from the UK Rail Accident Investigation Branch: Investigation into the derailment of a Bure Valley Railway passenger train near Brampton, Aylsham, Norfolk, 30 May 2011

June 16th, 2011 by

 Cms Resources Bure-Valley
Image of incursion of the derailed bogie into the passenger compartment
(by courtesy of the Bure Valley Railway)

The RAIB is investigating an accident that occurred when the 14:40 hrs passenger train from Wroxham to Aylsham derailed close to the village of Brampton, near Aylsham, in Norfolk.  The train was running on the Bure Valley Railway, a tourist railway, with a track gauge of 350 mm (15 inches).  The train consisted of seven coaches and a brake van and was hauled by a steam locomotive.  It was staffed by two crew members and was carrying 61 passengers.

The train is believed to have been travelling at about 16 mph (26 km/h) when the end of an axle under the second coach fractured, derailing its leading bogie.  During the derailment the other, undamaged, wheelset, fitted to the derailed bogie, forced its way through the wooden floor of the coach into a passenger compartment.

There were no reported injuries as a result of the accident and most of the passengers walked through to Aylsham to complete their journeys.  The remainder were transported by road.

The RAIB’s preliminary examination of the site confirmed that axle failure was the cause of the derailment.  There was no evidence that the maintenance of the track, or the operation of the train were factors contributing to the accident.

The RAIB’s further investigation activities will focus on the failure of the axle and will be independent of any investigation by the safety authority (the Office of Rail Regulation).

Unexpected Power Failure Costs RackSpace $3.5 Million in Refunds

June 15th, 2011 by

When the reliability of your “cloud” depends on a server farm’s power, a power outage can be a major incident.

If you are a web hosting service, unreliability can cost you customers. To try to keep your customers, you give refunds when a service outage happens. RackSpace announced in an SED filing that it will pay $3.5 million in refunds (service credits) due to a recent loss of service after a power problem and failure of back-up power.

So even in the “cloud”, equipment and power reliability are important.

What can we learn? That root cause analysis is important in all sorts of industries. Repeat problems (this isn’t the first power reliability issue) cause unhappy customers. Better to solve reliability problems the right way by addressing their root causes.

How Much Does an Accident Cost? The Fine Was £150,000 …

May 24th, 2011 by

The UK Health & Safety Executive posted a press release about a chemical plant in Rye, East Sussex, UK, that was fined £150,000 after a spill of waste solvents.

The initial tank failure that started the release was caused by internal corrosion of the tank.

For more information, see:

Wind Turbine Accident: Installation? Maintenance? One of a Kind Accident?

April 12th, 2011 by

Saw an interesting AP article about a wind turbine accident in North Dakota. The rotor and blades of a wind turbine had crashed to the ground.

Scott Winneguth, Director of Engineering for Iberdrola Renewables said the accident was “very out of the ordinary … a singular event.”

He also said:

I can assure you, for the near term, that we will check for bolt integrity and misalignment on a much more frequent basis than our normal activity would entail”

The first statement and the second statement make no sense when taken together. If this was a one off event, why change their standard practices? Also, why change the standard practices for just a short period of time?

The article also said:

Winneguth said the 70 turbines in the Rugby project were subsequently inspected and each of their 3,360 bolts checked. Seven bolts on four of the turbines were replaced as a precaution.

Seven more bolts replaced???

Duncan Koerbel, an executive for the turbine’s manufacturer, Suzlon Wind Energy Corp, said that the cause of the misalignment was “not known” and he “was not sure” how long the problem took to develop.

Does this sound like the need better troubleshooting and root cause analysis? It does to me!

If you need better equipment troubleshooting and root cause analysis, consider attending the 3-Day TapRooT®/Equifactor® Equipment Troubleshooting and Root Cause Analysis Course. We have courses coming up in:

Doha, Qatar
New Orleans
Edmonton, AB
Birmingham, UK
Knoxville, TN
Brisbane, Australia
Midland, TX

For more information and course dates, see:

Connect with Us

Filter News

Search News


Angie ComerAngie Comer


Anne RobertsAnne Roberts


Barb CarrBarb Carr

Editorial Director

Chris ValleeChris Vallee

Human Factors

Dan VerlindeDan Verlinde

VP, Software

Dave JanneyDave Janney

Safety & Quality

Garrett BoydGarrett Boyd

Technical Support

Ken ReedKen Reed

VP, Equifactor®

Linda UngerLinda Unger


Mark ParadiesMark Paradies

Creator of TapRooT®

Per OhstromPer Ohstrom

VP, Sales

Shaun BakerShaun Baker

Technical Support

Steve RaycraftSteve Raycraft

Technical Support

Wayne BrownWayne Brown

Technical Support

Success Stories

Many of us investigate accidents that the cause seems intuitively obvious: the person involved…

ARCO (now ConocoPhillips)

We initially started using TapRooT® in 2002 in order to improve the quality…

Contact Us