site map Root Cause Methodology and Tools for Improved Operations
Home
About TapRooT®
Course Info
Summit Info
Software
Equipment Troubleshooting
Weblog
Store
Company Info

TapRooT® Blog Category Archive

June 04, 2007

Monday Accident & Lessons Learned: What's the Root Cause of this Damage to the Undersea Pipeline Coating?

Watch the video and then vote by commenting on the root cause of this damage to an undersea pipeline protective coating.

Shark Movie click to play

Posted by Mark at 12:35 AM | Comments (0)

May 04, 2007

How "Minor" Mechanical Failures Lead to Major Accidents

How many times have you had an equipment failure occur, only to have the operators tell you:

Oh, yeah, it never has worked right.

Many would say this is a nuisance issue, sometimes costing a little extra for repeat repairs, but not worth a full investigation. At the Summit, Ken Reed presented how implementation of this philosophy is a roll of the dice, sometimes resulting in disastrous consequences.

Click below to receive a download of Ken's handout at this presentation.

Minor Equipment Failures

Posted by barbara at 11:20 AM | Comments (0)

Rickover's Legacy - Safety & Equipment Reliability - Secrets of the Nuclear Navy's Success

Try to build a nuclear power plant in someone's back yard, and you'll witness how communities come together to fight something they perceive as extremely dangerous. And yet, at the submarine base in Groton, CT, there may be as many as 18 nuclear reactors within a quarter mile of each other, in various stages of operation and maintenance. Why is there no public outcry over this "dangerous" situation? Admiral Rickover has put in place a program that has endured over 30 years. In his Summit presentation, "Rickover's Legacy - Safety & Equipment Reliability," Ken Reed told us what made Admiral Rickover's program endure, while the civilian program has, until quite recently, languished.

Click below for a download of Ken's presentation.

Rickover

Posted by barbara at 10:57 AM | Comments (0)

7 Step Method for Electronic Troubleshooting

Equipment troubleshooting is an art. It requires logic, focus, and system expertise to successfully conduct equipment fault analysis and repair. Troubleshooting electrical and electronic devices takes this one step further. Ken Reed's presentation at the Summit, "7 Step Method for Electronic Troubleshooting," reviewed troubleshooting strategies that can be employed when faced with electronic equipment failures. Click the link below to download a copy of Ken's presentation.

7-Step

Posted by barbara at 10:05 AM | Comments (0)

May 02, 2007

Combining Process Mapping of Procedures with Job Safety Analysis

Hazard Identification and Risk Analysis are internationally recognized pro-active safety tools that can take an organization's safety program to the next level. Job Safety Analysis is one way to translate that hazard and risk information into task specific steps that help the employee recognize and avoid the risks inherent in certain jobs. The challenge has always been to provide an end product that is useful enough that employees not only understand it and use it but also know how to revise it when situations change.

Dan Stevenson's Summit presentation, "Combining Process Mapping of Procedures with Job Safety Analysis," outlined one organization's successful effort to develop immediately available procedures for all critical tasks that provide useable information about the risks of the task and the tools needed to reduce the likelihood of personal injury or damage to equipment. This Procedure Mapping process proactively uses the familiar TapRooT® tools and introduces other simple tools to provide a very, user-friendly process that is easily used by all employees.

Click the link below for a presentation of Dan's 6-step overview of this process.

~ Barbara

Stevenson.Dan-1

Posted by barbara at 03:10 PM | Comments (0)

April 19, 2007

I Don't Need No Stinking Interlock!

A report in the Occupational Health and Safety eNewsletter for this week had a great article on nail gun injuries (article). I have used a nail gun to shingle my Dad's house, and it had an interlock that prevented me from pulling the trigger until the foot of the gun was pushed against the roof. I've seen those guns where the professionals just pull the trigger and bounce the gun against the roof, but I assumed that these guns were not available to the regular consumer.

Apparently, I was wrong, and so were the nearly 15,000 people in 2005 who were injured by nail guns. When I Googled for a picture of a nail gun injury, I had so many to choose from that it took a few minutes to work through them. Not a good sign.

Nail Gun Knee.jpg

nail-skull.jpg

How do you fix equipment problems like this? First, you must recognize the hazard. This means you need to be informed about what hazards are associated with the equipment. A nail gun may seem obvious (NAILS!!!), but what about the air discharge ports shooting dirt into your eyes? What about damaged airlines pressurized to 150 psi? And where do you use this thing? On a ladder? On the roof? Are you now off-balance using this thing?

Once you recognize the hazards, you must now figure out what safeguards are available. The trigger interlock is a prime example. Sure, you might look cool bouncing that nail gun across the roof, but it isn't quite as cool when it bounces across your foot. And considering the number of times a consumer uses a piece of equipment like this, how much time is really saved over the life of the nail gun? A couple of hours? Balance this "time savings" against the risks involved, and I hope the answer is clear.

The nail gun is only an example. Take a hard look at the risks you are taking with equipment, both at home and at work. Are you clearing jams in the machine while it is running because "it will only take a minute" and "we've always done it that way"? Maybe we should figure out why the machine is jamming in the first place (probably not designed to jam!).

Risk versus time savings. It's always a gamble, and the increased risk rarely works out in your favor.

Posted by kenreed at 08:23 AM | Comments (0) | TrackBack

April 11, 2007

Instrumentation Failures at BP Texas City

BP Trailers.jpg

After reading the final Chemical Safety Board report on the BP Texas City Refinery explosion, it is obvious that there were almost too many problems to count. Management problems, a non-existent safety culture, procedural non-compliance... the list goes on and on. Take a look at everything we teach and emphasize during a 5-Day course, and none of that was done.

Just trying to get your arms around the problems is tough. I decided to take a subset of the problems and take a look at only the issues related to instrumentation and equipment malfunction. Even this had an enormous amount of data involved, and trying to perform an accurate, comprehensive root cause analysis with the information presented in the report is not possible. However, it is obvious that equipment problems made up a large number of causal factors in this accident.

This incident is an extreme example of allowing your equipment failures to run your facility or business. Many of the actions taken by operators, supervisors, and the management team were dictated by the operational state of the equipment and instrumentation. Procedures were changed on the fly because the entire process did not work as it was initially designed. Actions were taken by operators and supervisors based on faulty indications. Some actions were not taken at all because some indications were never present. Work orders were not generated for known problems, and known problems were accepted as normal.

Many of the equipment problems were actually minor in nature. Fixing some of them would have been easy, but many reasons were given for allowing them to exist:
Not enough money
Not enough time (schedule pressure)
People were too busy
Work-arounds were already in place
Work order system was ineffective

Yet, none of these were that difficult to fix. They were allowed to happen, with full knowledge of the issues at all levels, from the operators up through senior management.

At the TapRooT® Summit later this month (information here), I will be giving a presentation on how "Minor Mechanical Failures Lead to Major Accidents." A good portion of this discussion will pull examples from the BP refinery explosion. Come join in the discussion, and see how easy it is to get yourself into these types of situations. Even better, see how easy it can be to keep yourself out of these situations.

Posted by kenreed at 10:40 PM | Comments (0) | TrackBack

March 28, 2007

Equipment Failure "Root Causes"

I was recently sent an article that described how a company, using "root cause analysis", was able to determine the cause of a recurring motor bearing failure. This particular failure was recurring about every 2 years. After hooking up vibration monitoring gear, they were able to determine that the bearing was nearing failure again. Investigation uncovered that the failure was due to an uninsulated bearing (wrong part) being installed in the motor, allowing circulating electric currents to damage the bearing. Therefore, the root cause was the wrong bearing was installed.

This type of problem would shown up as a Possible Cause using an Equifactor analysis. But TapRooT users know they can't stop there. "Mechanics installed wrong bearing", would then be put back into our SnapCharT, and a whole range of questions arise. What was the experience ad training level of the mechanics? Did the procedure specify the correct bearing? Was procedure use required? Why did this bearing fail numerous times before an analysis was performed? Did the manufacturer communicate this possible problem to their customers? Is the correct bearing easy to identify in the supply room? Were supervisors required to inspect thenew installation?

All these questions become obvious once you have a workable SnapCharT that shows the glaring holes in your investigation. With these holes filled, a meaningful TapRooT anlysis can now e conducted.

Posted by kenreed at 11:27 PM | Comments (0) | TrackBack

February 14, 2007

Correctly using a checklist

When a critical job must be done correctly the first time, every time, a checklist is often implemented. Checklists, when used properly, can make it much more likely that a particular job is completed as it was intended, with no mistakes.

Even when a checklist is in place, mistakes are still sometimes made. The problems with a checklist can take several forms:

1. The checklist was not used at all
2. The checklist was used, but had technical inaccuracies or confusing steps (there is a long list of possible root causes covered here)
3. The checklist was not used as intended

Let's talk about option #3 above. This problem could occur if:
- The operator completes several steps before checking them off ("checkoff misused").
- The operator had several actions to perform in a single step (">1 action / step").
- The operator lost his place in the checkoff, or forgot what steps had been ordered or reported as complete.

The last one is really the only one that is difficult to correct by making changes to the procedure itself. Losing one's place in a well-written procedure is a human error that may not have anything to do with the checklist itself. In the Navy, we developed a unique method of combatting this error. We used the highly-technical name, "the circle and X method." Basically, it was used as follows:

1. When the order to conduct a step was given ("Open valve CH-1"), a circle was drawn around the step number. If there were multiple actions within the step ("Open valves CH-1, CH-2, and CH-3"), a circle was drawn around the individual action.
2. When that step was reported as complete ("Supervisor, CH-1 is open"), an "x" was placed through the circle. It was now OK to move to the next step.

To prevent our procedures from getting destroyed due to multiple uses, we inserted a plastic document protector over that page, and used a grease pencil to make the circles and x's. It was then erased at the end of the procedure and used again.

This method was be used when there was a supervisor giving orders to operators. If the orders were given over a phone circuit, and there was a local supervisor present, he would also use this method in his local procedure to keep track of what had been ordered and completed. The local supervisor would not "x" his step until it had been completed and reported over the phones.

This method is not at all cumbersome if procedure use is required anyway. It is a good method of minimizing the opportunity for supervisors to make that honest mistake during the conduct of a procedure. It is also important that this policy is defined, so that everyone understands it and conducts it the same way every time. In the Navy, the Engineer had this policy clearly stated in his standing orders.

If you're having human error problems even when using well-written checklists, consider this method to remove yet one more opportunity for human error to occur.

Posted by kenreed at 12:16 PM | Comments (0) | TrackBack

January 31, 2007

Process Troubleshooting Tables

I just finished teaching the third day of the Equifactor® course here in Galveston, and one of my students (thanks, Jason!) brought up a great use for the custom table function in the software.

Normally, the custom tables are used to add your own equipment troubleshooting experience to the software. This will allow you to use TapRooT® to conduct root cause analysis of specific gear that may not already be included in the supplied tables. But what about process troubleshooting? For example, maybe your facility produces pure water. There are many problems you could have with this production process:

Low yield
Out of spec contaminant levels
Low pH
High pH
Poor clarity
Odor

These sound like "Symptoms", don't they?

You could make a custom troubleshooting table called (Process) Pure Water Production. Under that, you would list the symptoms of unsatisfactory performance, such as (Symptom) Poor Clarity. Under each symptom, list all the possible causes of that symptom (clogged filter, exhausted resin, flow too high, etc).

There is no reason to limit your custom tables to only specific equipment problems. This is a unique method of using the custom table feature in Equifactor®. It is yet another method of using the TapRooT Root Cause Analysis system to troubleshoot process quality issues.

Posted by kenreed at 08:26 PM | Comments (0) | TrackBack

January 17, 2007

Arc Flash Injuries

Arc_flashA recent report from a DOE site discussed an injury sustained by an electrician operating a hi-voltage switch on the front panel of a 480V switchboard.  A ground fault in the down-stream circuit cause a fire-ball to exit the vents on the switchboard when the switch was shut, causing serious burns and eye injury.  The report stressed that, if the electrician had been wearing proper PPE (flame-retardant shirt, safety glasses), his injuries would have been much less severe.

First, I’d like to stress that I don’t disagree with that particular finding.  Wearing proper PPE is an important safeguard in any potentially–hazardous process.  However, buried near the end of the report, other findings were mentioned:

– By the way, the ground detector that was installed didn’t work.

– By the way, preventive maintenance was not scheduled for the ground detector.

– By the way, the ground detector was the wrong type.

– By the way, PPE requirements were not posted on the panel as required.

– By the way, work control procedures already in place were not followed.

Arc Flash PPE

These are the kinds of things that you can easily find with a frequent, comprehensive proactive audit plan.  It is good to see that these problems were found by the investigating team.  It would have been even better if they had been found by an audit team.  Compare your work practices with industry standards, such as NFPA 70E: Electrical Safety in the Workplace. Take a look at your high-voltage, hazardous operations, and see if you have the right controls in place.  Everything may seem to be operating just fine, until an unknown failure pushes the deficiencies in work control to your attention.  Don’t wait for the fire-ball to find you.

Posted by kenreed at 05:02 PM | Comments (0) | TrackBack

January 16, 2007

Mark's Computer Hard Disk Goes Clunk

You know you had a bad day when:

1. Your hard disk crashes.

2. You replace the hard disk and try to load your "daily" automatic backup and it's blank.

3. You load your last "complete" automatic backup (about three weeks old) and it is blank too.

4. You finally find a backup that's not blank and it is 4 months old.

5. You send the broken hard disk off to the people who are the experts in disk recovery and they say - "Sorry - it's unrecoverable."

That's what happened to me last week.

Luckily, I have a wonderful staff and they are helping me recover.

But for now, I won't be posting on the blog or responding to routine e-mails.

Ken and Barbara will be keeping you up-to-date on the blog.

Barbara will be answering my e-mails.

And I will be on a one month sabbatical to get the new version of the TapRooT® Book completed.

By the way, the book writing process is going well and I will post some samples on this blog as we progress.

As for our "investigation" of the failure, it is complete and the corrective actions have been implemented. Next time, there will be a backup and an additional backup. And they will be checked as working appropriately on a weekly basis.

Posted by Mark at 01:15 PM | Comments (0)

January 10, 2007

Proactive Equifactor

Most TapRooT® equipment reliability professionals are using Equifactor® to help troubleshoot their equipment failures, but how can you use Equifactor® as a proactive tool? Here are some great ideas.

When a new piece of equipment arrives, how should it be installed? Use the Equifactor® troubleshooting tables to discover the common failure modes of your new gear. You may remember that baseplate design (materials, resonant frequency rejection, etc) should be considered when mounting that new pump.

The best time to decide on PdM requirements is prior to installation. Getting the vibration monitoring transducers and temperature detectors wired in is much more difficult once the machine is installed. Use Equifactor® to see what can fail, and then decide on how you will monitor for these types of failures. Get your PdM requirements figured out early.

Once it's installed, it is important to operate it correctly. While the vendor manual may discuss start-up and shutdown procedures, it may not consider your unique operating environment or specific uses. Will your machine have a duty-cycle drastically different from the original design? Is the environment extremely dusty? Equifactor® will remind you to look at these factors.

What logs should you maintain? How often should you take these logs? Look at common failure modes to help verify you are monitoring the right parameters.

Vendor manuals give good guidelines concerning preventive maintenance periodicities, but you may also want to use RCM methods to streamline these requirements. Equifactor® can help here, too, making sure the maintenance requirements will cover all probable failure modes.

For already-installed equipment, use Equifactor® to conduct an audit of your gear (possibly in conjunction with an RCM determination or an FMEA). This may open up ideas you hadn't previously thought about.

Be creative in your use of Equifactor®. It is an excellent tool when performing a root cause analysis of equipment failures, but you don't have to wait until it breaks to fix it!

Posted by kenreed at 07:28 AM | Comments (0) | TrackBack

December 20, 2006

Equifactor® Equipment Troubleshooting & Root Cause Analysis Courses

Are you looking to improve the performance of your equipment? Are you tired of seeing the same equipment failures happening over and over again? Would you like to get out of the "break and fix" cycle at your facility? Try applying the easy to use Equifactor® Equipment Troubleshooting module of the TapRooT® Root Cause Analysis System to your machinery problems. Equifactor®, in concert with the rest of the TapRooT® System, will help you find and fix the real root causes of those failures, instead of just fixing the same symptoms time and time again.

We will be holding Equifactor® courses throughout the world in 2007:

Jan 29-31 - Galveston, TX
Feb 26-28 - Halifax, Nova Scotia, Canada
Apr 11-13 - Gothenburg, Sweden
April 23-24 - San Antonio, TX (Pre-Summit course)
May 9-11 - Calgary, Alberta, Canada
June 13-15 - Aberdeen, Scotland
Jul 25-27 - Dallas, TX
Aug 8-10 - Lake Tahoe, NV
Nov 19-21 - Edmonton, Alberta, Canada
Nov 28-30 - New Orleans, Louisiana

Get those travel budgets in place now! And don't forget about the Summit in San Antonio, TX April 25-28 (during the Fiesta!!).

Posted by kenreed at 08:32 AM | Comments (0) | TrackBack

November 02, 2006

Maintenance at Texas City

I know Mark has already made his stance known on the BP refinery disaster (I refuse to call it an "accident"). As I read the report, one thing came through loud and clear:

Costs were cut in maintenance and infrastructure upgrades to save money at the expense of all else:
“BP implemented a 25% cut on fixed costs from 1998 to 2000 that adversely impacted maintenance expenditures and infrastructure at the refinery,” (the CSB Chairman) said. Maintenance spending fell throughout the 1990’s at the then-Amoco refinery, and following the merger with BP further cuts were imposed...“Large majorities of the survey respondents reported significant maintenance backlogs that were harming safety. Disturbingly, most employees agreed that production and budget compliance gets recognized and rewarded before anything else at Texas City.’”

These costs were cut by the original owners (they were getting ready to sell, why put money into maintenance and upgrades?), and then continued by the new management (even though they must have known what was going on). I don't know their rationalization at this point, but you don't need 20/20 hindsight or sophisticated root cause analysis to see what the results of this kind of budget control will do to an older facility.

Unfortunately, many of us face these same decisions every day. Many of us work at aging facilities, with older and outdated equipment. Do I let the old gear operate poorly, do I conduct extensive repairs to the equipment, or do I decide to invest in new, modern, up-to-date technology? This is not always an easy call. How do I make this decision?

First of all, you must be able to make a business case for whichever decision you make. You must compare the cost of maintaining the status quo to the costs and savings possible with the anticipated upgrade. The costs include unanticipated downtime, spare parts that may no longer be available, maintenance department salaries and overtime, excessive power usage, etc. The benefits can be ease of operation, safer equipment, less required preventive and corrective maintenance, energy efficiency, fewer environmental concerns, etc.

But this business case must include more than just dollars. Safety and risk mitigation must be included in the decision. No matter how much money you think you will save, if you are putting your workers at risk, that is the new bottom line.

And it is not just upper management who has the responsibility here. Although they may have the ultimate responsibility on paper, it is the front-line supervisors that can and must make the difference. Being told to just suck it up is NOT acceptable when equipment has deteriorated to the point that safe operation is a roll of the dice. You must stand up and be heard, more than once if that's what it takes. You guys know the real conditions at your facilities, and you cannot accept dangerous conditions.

I am not so naive that I think this is easy. But I am enough of a realist to know that it CAN AND MUST BE DONE.

Posted by kenreed at 10:06 AM | Comments (0) | TrackBack

October 25, 2006

NRC Equipment Failure Summaries

The NRC has some great reading on their website. In the library section they have 4 statistical studies on the failure results of various pieces of equipment over the past 20 years. They have looked at Pumps, Diesel Generators, Circuit Breakers, and Motor Operated Valves. For example, they have looked at pump failures from 1980-2000, listing the "proximate causes" and the "coupling factors" associated with these pump failures. You can see these reports for yourself here.

There is a lot of good data there (over 200 pages worth for the pumps), giving a statistical analysis of the contributing factors to these failures. Some statistics on the pump failures:

39% were due to Internal Component failures (includes dirt, lubrication, wear and tear), which they attribute to inadequate maintenance.
24% were Design problems (error in specs, incorrect calculations, mounting design).
20% were categorized as Human Error (incorrectly following procedures, poor procedures, inadequate training, accidental action).

These categories add up to 83%. And, after reading these, it is obvious that these are all human performance problems. The other 17% were attributed to Other (setpoint drift), External Environment, and Unknown. A high percentage of these are most likely also due to human error.

This drives home the point that very few equipment failures are due to the equipment just wearing out. Few pieces of gear make it to the end of life region on the Weibull curves, and even the random failures are not due to statistically calculated material failures, but due to the incorrect performance of people. The maintenance tech, operator, inspector, or designer almost always contributes to or intiates the failure.

The NRC does not list the root causes that they determined for these failures. However, a telling example of their conclusions can be seen on page 33 of the pump report, which blames one particular pump failure as being due to "operator inattention to detail." I can almost read the corrective action for this: "Conduct training with all operators, emphasizing the importance of reading and following all written procedures." In more common words, "Tell the operators to be more careful."

Posted by kenreed at 07:30 AM | Comments (0) | TrackBack

October 18, 2006

Department of Energy Maintenance Best Practices

 Wg Im M Images Efcog02A

Interested in a web site with maintenance best practices from DOE sites? See:

http://www.efcog.org/bp/maintenance.htm

Now that you've had a look at the DOE web best practice site, consider attending a conference with a whole track about maintenance improvement and best practices - the TapRooT® Summit.

Equipment Reliability & Maintenance Best Practices Track of the TapRooT® Summit
(Track Leader - Ken Reed)

1.
How "Minor" Mechanical Failures Lead to Major Accidents (Wednesday, 10:30-12: Ken Reed)

2.
7 Step Method for Electronic Troubleshooting (Wednesday, 1-2:20: Ken Reed)

3.
Equipment Failure Show & Tell (Wednesday, 2:40-3:55: Ken Turnbull & Steve Swarthout)

4.
Rickover's Legacy - Safety and Equipment Reliability - Secrets of the Nuclear Navy's Success (Thursday, 9:10-10:20: Ken Reed)

5.
Developing an Equipment Troubleshooting Strategy (Thursday, 10:40-12: Ken Reed)

6.
Equipment Reliability Best Practices (Thursday, 1-2:20: Ken Reed, Facilitator)
Two Speakers:
SKF Speaker - To Be Determined
TapRooT(R) User - To Be Determined

7.
Lessons from The Crime Scene - Evidence Preservation for Accident Investigation Thursday, 2:40-3:55: Ken Reed)

8.
Proactive Use of Equifactor(R) to Improve Equipment Reliability (Friday, 9:15-10:25: Steve Swarthout)

Also, Summit attendees can attend a special pre-Summit 2-Day TapRooT®/Equifactor® Equipment Troubleshooting and Root Cause Analysis Course and save $200 off the course tuition when they also sign up for the Summit.

Posted by Mark at 03:41 AM | Comments (0)

October 11, 2006

Failure Codes

A recent audit of NASA contractors found that the root causes of many failures were being coded improperly, causing many to be improperly tracked and corrected. For example, when a wire harness was taped instead of clamped, the code "Operational Degradation" was used instead of "Workmanship." In another example, a finding of "Excessive Corrosion and Rework Damage" was coded as "Environmental Damage", but no code was assigned that covered the "rework" problem.

Why would these codes be used inproperly? Several reasons may exist:
- There is unclear guidance as to how to apply the cause codes
- There codes are used for multiple puposes. For example, in the cases above, the cause codes are used to apply corrective actions and to assign monetary award levels based on the type of code. Seems pretty likely that someone (who is trying to obtain the award bonus) may "err" conservatively when assigning a cause code!

When your maintenance techs are performing mantenance, they are often required to assign a cause code of some type to identify why a repair was required. What motivations are in place to make your techs put in the right code? Is there a policy in place to determine the code?

Using Equifactor® in conjunction with TapRooT®, the ambiguity disappears. It is no longer up to the whim of an individual with unknown motivations to assign a root cause. TapRooT® assigns root causes based on the information from human performance experts, with little room for bias. By using Equifactor® with TapRooT®, you can obtain consistent root causes that make your results trendable, and therefore useful.

Posted by kenreed at 08:54 PM | Comments (0) | TrackBack

September 27, 2006

Ken's on Vacation - No Maintenance/Equipment Reliability Root Cause News Today!

Ken Reed - our Equifactor® Guru - is on vacation.

Sorry - no maintenance/equipment reliability root cause news today.

But if you are headed for the 5-Day Course in Groton next week, you'll see Ken there (he's teaching it).

But please stay tuned for more maintenance and equipment reliability improvement ideas next Wednesday.

And for your enjoyment, here's a picture of an equipment failure...

Collapsed Sphere Copy

Posted by Mark at 03:10 PM | Comments (0)

September 26, 2006

Send Me Your Safety, Quality, Production, Maintenance, and Environment Incident Pictures (even near misses)

Dscn1812

(click on picture to enlarge)

Did you see the one step away from death video and picture that I posted last Tuesday?

I saw an incident (near-miss) and recorded it.

Now others can learn from it and use it in their safety meetings to raise awareness about fall protection and proper work practices.

If you see something that just isn't right ... for example, a:

- quality problem
- safety problem
- near-miss
- production upset
- maintenance issue
- equipment failure
- environmental release
- or any other "event"

Take a picture or a video and send it to me at "info@taproot.com" and I'll post it here to share it with others.

If you want to remain anonymous, just let me know and I won't use your name or your company's name with the posting.

By passing along pictures of problems you can help others save lives, save jobs (by improving quality and preventing operating and maintenance problems), and save the environment (by preventing accidental releases).

And please feel free to use the pictures, videos, and other information from this blog to make performance better at your site.

And if you want to improve your systematic performance improvement attend a TapRooT® Course and the TapRooT® Summit.

Thanks for your help.

Mark

Posted by Mark at 11:16 AM | Comments (3)

September 07, 2006

ALASKA PIPELINE FAILURE

BP Pipeline.jpg
More data has been released concerning the Alaskan oil pipeline leak that shut down a major portion of the Prudhoe Bay oil field on the Alaskan North Slope. It appears that BP changed their pipeline PM requirements based on the history of failures, then did not check to see if this new schedule was working correctly. After years of running a pig through the pipeline at fairly close intervals to clean the pipes, they decided to stop the cleaning and only conduct spot ultrasonic testing of the piping instead. Another pig inspection and cleaning was slated for next year (9 year interval), but leaks were found last month that required the shutdown. The company now plans to replace 16 miles of deficient piping.

Anytime a major change is made to a PM schedule, many risks must be considered. In hindsight, it may be easy to say that BP poorly anticipated the consequences of their change in maintenance strategy, but how do you mitigate these possible consequences?

It would be nice to know what the possible modes of failure are when changing (or initially developing) a maintenance plan. Once these failure modes are known, you can tailor your preventive maintenance to target the most likely (or catestrophic) failures.

Equifactor® to the rescue! When you initially install a new piece of gear, why not take a look at Equifactor® to determine how your machine might fail? Set up your PM schedules to target these failure modes, and get rid of those PM's that are not relevant to your piece of gear. This will allow you to funnel your maintenance dollars toward the areas actually needing the maintenance. Consider Equifactor® to be an important tool in your RCM toolbox.

Another consideration for developing (or changing) your maintenance strategy is the possible consequences of a failure. Drastic changes in a critical system may not be advisable. Smaller changes, or additional checks, might need to be instituted to catch costly failures resulting from changes to the maintenance plan.

Posted by kenreed at 11:47 AM | Comments (2) | TrackBack

August 16, 2006

TapRooT® Down Under

I've been away for a few weeks, visiting our Aussie neighbors. Capability Resources, one of our new partners in Australia, sponsored a 3-Day Equifactor® course in Singleton, NSW, just a few hours north of Sydney. The course was a great success, and I had the opportunity to meet equipment operators from several different segments of the mining and drilling industries.

Amazingly enough, the southern hemisphere has many of the same attributes we see here in the north. Highly safety-oriented, hard-working people, with great attidudes (and cool regional accents!). Unfortunately, they also have many of the same problems: equipment operated beyond its intended design, people making honest mistakes while trying to do the "right" thing, poorly analyzed "root causes" of equipment failures, and "break and fix" repair strategies.

Equifactor® and TapRooT® are an excellent complement to the Australian mining industry. Their high-volume and high-tech mining operations demand highly reliable equipment operation, and Equifactor® is an obvious choice for their troubleshooting toolboxes.

Thanks to Greg, Peter, and Ross for a truly enjoyable trip. I look forward to working with them again in Singapore in November!

View image

View image

Posted by kenreed at 08:30 AM | Comments (0) | TrackBack

August 15, 2006

Why is Manufacturing Leaving the US? ... And What Can Be Done To Stop the Trend

Why is Manufacturing

Leaving the US?

And What Can Be Done To Stop the Trend

Many say the end of manufacturing in the US is the natural and inevitable result of a global economy. They say manufacturing, which is heavily labor dependent, will seek the cheapest labor.

But this is NOT the whole story. Most manufacturing is as capital dependent as it is labor dependent. And with more automation every day, labor costs are less of a factor than they once were. Instead, I propose that 3 other factors are just as important:

• The Cost of Expensive Regulations
• Too Little Investment in Improvement
• Equipment Unreliability

First, the US regulatory burden, especially unnecessarily expensive environmental regulations, are almost non-existent in third world countries.

Second, US manufacturers, in an attempt to cut costs, have failed to invest in problem solving technology like advanced root cause analysis. Thus problems that could have been solved to cut costs happen over & over again while manufacturers implement ineffective, wasteful fixes.

Third, the cost of unreliable equipment at facilities is an unrecognized source of expense that magnifies labor costs. If manufacturers had more reliable equipment, productivity would improve (people wouldn't waste time waiting around for frequent repairs).

The solution for two of these problems isn't difficult or expensive. The second and third problems can be solved by using TapRooT® and Equifactor®. Call Ken Reed at SI (865-539-2139) or e-mail him by using this web site.

Posted by Mark at 08:03 AM | Comments (2)

August 10, 2006

More Links to BP Pipeline Corrosion Story

 2006 08 07 2006-08-07T133227Z 01 Nootr Rtridsp 2 Oukbs-Uk-Energy-Bp

(old pipeline photo - click to enlarge)

Here are some more interesting links about the BP Pipeline Corrosion Story...

BP statement:

http://www.bp.com/genericarticle.do?categoryId=2012968&contentId=7020594

Another BP press release:

http://www.bp.com/genericarticle.do?categoryId=2012968&contentId=7019988

US News questions pipeline industry practices:

http://hosted.ap.org/dynamic/stories/O/OIL_FIELD_SHUTDOWN?SITE=DCUSN&SECTION=BUSINESS&TEMPLATE=DEFAULT

AP story - old story on previous pipeline leaks:

http://www.wtop.com/?nid=111&sid=864072

More leak news and clearer understanding of how past leaks may have led to complete shutdown now:

http://business.scotsman.com/latest.cfm?id=1142752006

Here is a nutty conspiracy theory story on timing of shutdown. Some people will find corporate evil even when their facts make no sense (stop selling 400,000 barrels per day when the price of oil is at record heights to make MORE money on oil?). Obviously this guy didn't read stories about the previous leaks and understand the pressure BP is under not to have more leaks. Does he have a personal axe to grind or does sensationalism just make his site more readable? Maybe that's why his bio page says he's a persona non grata in the US with the US media. Here's the link to the fringe conspiracy theory:

http://www.gregpalast.com/british-petroleums-smart-pig#more-1474

Posted by Mark at 01:45 AM | Comments (0)

Bloomberg Provides More Info on BP Pipeline Problems

Click on this link for more info on BP's pipeline closure.

Posted by Mark at 12:34 AM | Comments (0)

August 09, 2006

Corrosion Impacts US Economy

BP shuts down a pipeline that feeds 400,000 barrels per day of crude oil to the Trans-Alaska pipeline and crude prices shoot up 3%. This could translate to a 5¢ to 10¢ per gallon increase at the pump.

We all know that the price of oil effects the economy, but who would have guessed that equipment reliability (system integrity) on a 20 mile long pipe would impact the whole US (and thereby the global) economy?

As production capacity and the world economy get more closely linked, the ability to keep everything functioning smoothly (reliably) becomes more and more important. That's why tools like Equifactor® and TapRooT® are used by industry leaders to troubleshoot and understand the root causes of equipment problems and improve equipment reliability.

To learn more about equipment troubleshooting and root cause analysis, attend a TapRooT®/Equifactor® Course. For more information see:

3-Day TapRooT®/Equifactor® Equipment Troubleshooting & Root Cause Failure Analysis
San Antonio, TX October 11-13,2006
Edmonton, Alberta November 20-22
Dubai, UAE November 28-30
Charleston, SC December 6-8

Don't wait until reliability problems at your plant impact the global economy. Attend a course and learn how the combination of TapRooT® and Equifactor® can help you use advanced equipment troubleshooting and root cause analysis to improve equipment reliability.

Posted by Mark at 12:27 AM | Comments (0)

August 02, 2006

SKF SPONSORS 2-DAY PUBLIC EQUIFACTOR® COURSE IN EUROPE IN SEPTEMBER AND 3-DAY PUBLIC EQUIFACTOR® COURSE IN DUBAI IN NOVEMBER

SKF is sponsoring two more Equifactor® Courses in 2006.

2-Day Equifactor® - UTRECHT, The Netherlands - September 14-15, 2006

The first is a 2-Day Equifactor® Equipment Troubleshooting and Root Cause Analysis Course that will be held at SKF's facilities in Utrecht, The Netherlands. This is a special version of the Equifactor® Course that concentrates on Equipment Troubleshooting, understanding what happened using SnapCharT®, and using the Root Cause Tree® to identify the root causes of human errors and equipment failures. This course includes an Individual User version of the TapRooT® Software for each attendee (a $1495 value without the class). This software includes computerized versions of Heinz Bloch's equipment troubleshooting tables. Course attendees also get a copy of Heinz Bloch's book, Machinery Troubleshooting and Failure Analysis (a $115 value) and a copy of the TapRooT® Book (a $195 dollar value). The course cost is $1690 US Dollars (or $1190 US Dollars for people attending from a company that has a TapRooT® Software License). E-mail info@taproot.com to register for this special course.

3-Day Equifactor® - DUBAI, UAE - November 27-29, 2006

The 3-Day Equifactor® Equipment Troubleshooting and Root Cause Analysis Course that will be held at SKF's facilities in Dubai, UAE. This 3-Day version of the Equifactor® Course includes everything in the regular 2-Day TapRooT® Course plus a day that concentrates on Equipment Troubleshooting. This course includes an Individual User version of the TapRooT® Software for each attendee (a $1495 value without the class). This software includes computerized versions of Heinz Bloch's equipment troubleshooting tables. Course attendees also get a copy of Heinz Bloch's book, Machinery Troubleshooting and Failure Analysis (a $115 value) and a copy of the TapRooT® Book (a $195 dollar value). The course cost is $1890 US Dollars (or $1390 US Dollars for people attending from a company that has a TapRooT® Software License). E-mail info@taproot.com to register for this special course.

Posted by Mark at 01:12 AM | Comments (0)

July 19, 2006

Failed Rudder - Hard Turn - Injured Passengers - Ship Back to Port - Root Cause???

Some initial reports indicate that a mechanical failure of the rudder may have been the cause of an accident aboard the Crowne Princess (operated by Princess Cruise Lines).

How do they troubleshoot equipment failures?

How do they analyze root causes?

I would bet there were previous near-misses that weren't thoroughly investigated. I would also bet that if these near-misses' root causes had been corrected, the cruise line could have prevented these injuries and this public relations fiasco.

For a more detailed initial report see:

http://www.cnn.com/2006/US/07/18/cruise.return/index.html

Posted by Mark at 02:52 PM | Comments (0)

Latest Equifactor(R) Newsletter

The latest edition of the Equifactor Minute has been released. If you did not receive it, please email me here. You can download a copy of the newsletter here.

Topics in this edition include:
- TapRooT(R) Summit
- The Art of Maintaining Stand-by Pumps
- Equifactor(R) Software Improvements
- Up-coming Equifactor(R) Course Schedule
- Recent Blog Entries.

Let me know if you have anything you would like to see in up-coming editions of the newsletter.

Posted by kenreed at 01:39 PM | Comments (0) | TrackBack

July 14, 2006

HPRCT Conference Summary

Mark and I just got back from the 12th Annual Human Performance, Root Cause, Corrective Action and Trending Conference held in Charleston, SC this week. If you are not familiar with this conference, it is set up for mainly those in the nuclear industry, discussing recent trends in those areas. It was a very well-organized conference, with a lot of great ideas and opportunities for networking with anyone interested in advanced programs covering these topics. I gave 2 talks:

Why Don't People Follow The Rules
Evidence Preservation for Equipment Failure Troubleshooting

One common issue that I find is being seen throughout industry (including the nuclear industry) is the tendency to dive into equipment troubleshooting before a solid, usable troubleshooting plan is in place. I have discussed this topic in other venues, and I have found that it common in most industries, including mining, paper production, petrochemical, and power generation. I have attached a copy of my talk here. Please take a look at it, and decide how you are combatting this problem.

Posted by kenreed at 01:38 PM | Comments (0) | TrackBack

July 12, 2006

Department Of Energy Free Software

The US Department of Energy has made available several software programs to help facilities decide on the most energy-efficient equipment for their specific applications. For example, the MotorMaster+ software contains a database of over 32,000 motors (both domestic and international), describing the "best fit" motor for your application and showing potential energy savings. Other software available includes:

AIRMaster+
Chilled Water System Analysis Tool (CWSAT)
Combined Heat and Power Application Tool (CHP)
Fan System Assessment Tool (FSAT)
MotorMaster+ 4.0
MotorMaster+ International
NOx and Energy Assessment Tool (NxEAT)
Plant Energy Profiler for the Chemical Industry (ChemPEP Tool)
Process Heating Assessment and Survey Tool (PHAST)
Pumping System Assessment Tool 2004 (PSAT)
Steam System Tool Suite

These tools can be downloaded from their website or you can order a CD from the same site.

With the volume of recent literature discussing the advantages of energy efficiency in your facilities, these tools may help in your decision-making processes.

Posted by kenreed at 08:00 AM | Comments (0) | TrackBack

July 05, 2006

Where are the savings?

Often, with limited resources, we are trying to find ways to convince management that a particular system will make a measurable gain in productivity. For example, implementing an equipment vibration monitoring system can save the company money in various ways:
Planning repairs around scheduled maintenance.
Limiting overtime for emergency repair
Limiting emergency shipment of replacement parts
Eliminating waste on restarts
Eliminating waste due to the original equipment failure

These gains, of course, must be balanced against the cost of inplementing the changes. For example, the cost of implementing the PdM stratedy above will include:
Buying vibration monitoring sensors
Installing cabling for the sensors
Training personnel to use the system
Analyzing the results

So why do nearly 80% of PdM implementations either fail outright or show very little savings? The Answer...A PdM strategy that does not include a proven root cause analysis technique will continue to have the same problems show up over and over again. Wouldn't it be nice if we could:

Detect impending equipment failure
Determine why the failure is occuring
Schedule corrective maintenance on both the equipment and the root cause of the failure for completion during a scheduled maintenance period
Never see this same failure again

Using a PdM system to detect and correct failures is only half the answer. The final strategy MUST include correction of the underlying reason of the failures. This is where the TapRooT(R) Root Cause Analysis system and the Equifactor(R) Equipment Troubleshooting module, melded with an effective PdM technique, will quickly recover the unrealized savings.

Posted by kenreed at 01:35 PM | Comments (0) | TrackBack

June 25, 2006

Public 2-Day Equifactor(R) Course at SKF's Facilities in Utrecht City, The Netherlands, on September 14-15, 2006

Interested in troubleshooting equipment failures and analyzing their root causes?

Live in Europe?

Then you should start planning to attend the special 2-Day TapRooT(R)/Equifactor(R) Equipment Troubleshooting & Root Cause Analysis Course that is being sponsored by SKF (and held at their facilities) in Utrecht City, The Netherlands on September 14-15, 2006.

This course isn't on our registration pages yet, so if you want to register now you will have to call our offices in the US at 865-539-2139 or use the e-mail link.

The cost for this 2-Day Equifactor(R) Course is $1690 US Dollars.

This price includes:

  1. An individual user version of the TapRooT(R) Software (priced at $1495 without the course).
  2. Heinz Bloch's book - Machinery Failure Analysis and Troubleshooting (a $115 value) - and
  3. The TapRooT(R) Book (a $195 value) by Mark Paradies & Linda Unger.

That's $1805 of software and books in a course that only costs $1690. What a deal!

Space in the course is limited so register by phone or e-mail today.

Posted by Mark at 04:05 PM | Comments (0)

June 23, 2006

Reliability issues

I often get questions that sound something like this...
"I've been trying to sift through the bewildering amount of information out there on equipment reliability. Is there a
"relatively simple" method that lends itself toward machine reliability issues?"
In other words, is there a simple method available that will make my machinery more reliable?

That's a pretty tall order. There are many directions you can go, and you really need to narrow down your area of concern. Are you looking for:
- Predictive Maintenance capabilities
- Theory behind equipment failures (Weibull curves, etc)
- Equipment Troubleshooting aids
- Root Cause Analysis
- Etc, etc, etc

I can definitely help with the root cause analysis problems prevalent in industry today. I facilitate companies throughout the country on accident root cause analysis, and I see equipment reliability and failure issues all the time. More often than not I see companies that do an adequate job of applying predictive maintenance techniques to track when a piece of gear is failing, but they rarely try to find out why it is failing. That is, until a catastrophic failure forces them to perform a root cause analysis of the incident.

I do not feel that equipment troubleshooting and root cause analysis should be separated. If you are troubleshooting your equipment, it means you had a failure of some type. Are you satisfied with repairing the symptom and putting it back in service, or do you want to find out why you are troubleshooting in the first place? The same problem is almost guaranteed to happen again, unless the actual cause of the failure is discovered and corrected. This goes to the heart of equipment reliability issues seen in almost every industry segment.

Using Equifactor(R) in combination with the rest of the TapRooT(R) system will provide you with a method of looking beyond mere symptom correction. The combination of a systematic troubleshooting tool with a world-class root cause analysis system provides an extremely effective yet easy-to-use tool for finding out why your equipment fails.

Don't be satisfied with finding and correcting symptoms. Use Equifactor(R) to define the problems with your equipment, then apply TapRooT(R) to find out why you have the problem in the first place.

Posted by kenreed at 07:56 AM | Comments (0) | TrackBack

June 07, 2006

The Art of Maintaining Stand-by Pumps

I want to maintain the highest possible reliability of 2 parallel centrifugal pumps. One is the operating pump, and the other is a stand-by pump, required only as a back-up in case the running pump fails. What is the best run-time strategy to maximize the reliability of the pumps?

My first thought was, "50:50, of course!" That way the wear and tear on the pumps is spread out over both pumps, doubling the effective lifetime of the equipment. Seems reasonable to me.

Unfortunately, if you are using this strategy, there is a good chance you are significantly accelerating the wear on the pumps, resulting in increased downtime!! Read all about it...

One assumption that has to be made: The pumps are using mechanical seals. Pumps with packing glands normally are wetted by the working fluid. These pumps will probably require a set cycle schedule for packing maintenance. This has been the strategy (weekly pump shifts) for packed pumps for years, and it hasn't necessarily changed for mechanically sealed pumps.

So why is a 50:50 run strategy bad?
First, the major wear and failure factor when considering a mechanical seal is the number of start-stop cycles, not overall run time. Starting and stopping the pumps solely for equal run time puts enormous stress on the seals
Next, 50:50 introduces many more failure modes than are present in a standby pump.
Finally, with perfectly even wear, both pumps (theoretically) should fail at about the same time. Not the ideal situation for an emergency standby pump!

It seems, then that the fewer start-stop cycles, the better. Ideally, as far as mechanical seal wear goes, the stand-by pump should never be started, maintaining it in pristine condition, ready to take over on the loss of the duty pump.
The problem with this, however, is that you no longer have confidence that the failure modes specific to the stand-by pump (fail to start, failure to reach full capacity) are not present.

A good compromise is a 90:10 ratio. For example, run the duty pump for 8 weeks, then run the standby pump for a full 8-hour shift. Then SHIFT BACK TO THE DUTY PUMP. This has several advantages:
- You have confidence the pump will run when needed.
- It will prove it can reach full load capacity
- It can be scheduled around your normal PdM periodicities. For example, conduct thermal and vibration analysis of the standby pump at the 2-month point, killing 2 birds with one stone.
- Most people shift pumps weekly, which is a total of 104 starts or stops for the 2 pumps over the course of a year. The 90:10 strategy lowers this to only 10 or 12 total.

Some facilities have actually color coded their pumps. The duty pump is green, and the standby pump is red. When somebody sees the red pump running, they can now question why we are not in the "reliable" line-up. The forces the operators to immediately report failures of the duty pump.

This philosophy may not "feel right", but there is plenty of data to back it up.

Again, this assumes that the prevalent failure mode is seal failure. It also assumes that there are not other extenuating circumstances requiring pump shifting. For example, maybe you have a history of false brinelling of the standby pump bearings if the pump is idle for xx weeks.

Take a look at your strategy. You may find you are able to increase your equipment availability, reduce downtime, and limit repair costs, just by adjusting your pump switching schedule.

Posted by kenreed at 07:50 AM | Comments (0) | TrackBack

May 31, 2006

Equifactor(R) and the SnapCharT(R)

How important is the SnapCharT(R) in the Equifactor(R) process? The TapRooT(R) system teaches that the first step is developing a good SnapCharT(R), and then gathering more detailed information from there.

But what happens when you develop your SnapCharT(R), analyze your failure, find a lot of information about the failure, but you don't know where to put it in the SnapCharT(R)? Let me give you an example.

A reciprocating compressor has failed, with a very high vibration in evidence. Your troubleshooting has exhausted the "easy" stuff from your Equifactor(R) analysis (speed incorrect, lubrication system inadequate), and you have now been forced into a teardown of the compressor. Upon inspection, you find:
- heavily fractured piston
- a loose piston rod nut
- metal fragments in the cylinder valves
- water condensed on cylinder surfaces
Now, where do you put these items on the SnapCharT(R)? TapRooT(R) teaches that we normally construct a SnapCharT(R) in the order of occurence; that is, insert the information in the spot in the SnapCharT(R) at the time at which it occurred. However, in this case, when did these new datapoints occur? Did a slug of water enter the cylinder and cause the problems? Did the connecting rod nut become loose due to the failure, or did the loose nut cause the failure? These pieces of data should go in the SnapCharT(R) in the order that they occurred, but in this case, when did these events occur?

A solution to this problem requires several steps.
1. If at all possible, eliminate one of the possible causes by further analysis. In this example, if there is a dehydrator just prior to the inlet, verify it is working properly and is not saturated. If it appears to be working correctly, the water probably did not enter the cylinder here. It may have condensed after opening the cylinder, especially if the cylnder is activley cooled.
2. For what is left (in this case, loose nut or failed piston), you may be forced to just leave these conditions after the incident on the SnapCharT(R). This is not a cardinal sin. In fact, it is much worse to force a condition before the incident by guessing and put it in the wrong place. You now have conclusions being drawn on incorrect information. For example, if you say the nut came loose first and force it before the incident, you have effectively made this the cause of the piston failure when in reality it may not have been.
3. For whatever is left, analyze the most likely causes of these conditions. For example, what could cause the piston to fail, assuming the connecting rod nut was not loose? Is the compressor being operated above its rated capacity? In fact, in this scenario, you would find that the cylinders have been re-bored to increase throughput by 25% above vendor spec. Would we have found this if we had assumed the nut had come loose first?

In general, your SnapCharT(R) should be developed in order of occurence. However, especially with equipment problems, this may be difficult to accomplish. Don't force your SnapCharT(R) just to be more asthetically correct, if this will introduce errors into your analysis.
This type of problem can also be seen in incidents involving drug testing (when was the drug ingested?) or autopsies.

Posted by kenreed at 08:10 AM | Comments (3) | TrackBack

May 24, 2006

To RCFA or not to RCFA...

Recently, there has been some debate as to the priority of conducting a root cause failure analysis of equipment failures, as compared to implementing a Reliability Centered Maintenance program. Which one gives you the most bang for the buck? If you only have money and resources to do one of the two, which one should you choose?

First of all, I don't believe you can completely separate an RCM system from an RCFA system. One of the cornerstones of the RCM process is determining what PM's can be modified or disposed of by analyzing what your past equipment performance indicates is required. However, this approach requires that the past failures conform to an analysis which can assume that the equipment is operating as it is designed. Unfortunately, most failures (over 80% by most conservative estimates) are not due to end of life, equipment design criteria, but due to "unknown" or "random" failures. In terms of the RCM process, these definitions may fit, but in reality, they are only unknown or random because we haven't conducted an effective RCFA to determine what caused the failure.

Opposing this, however, is the need to determine what needs an RCFA conducted. If an asset does not have the right maintenance program (and most maintenance programs do not meet the minimum standard required by RCM), the result is a significant number of failures caused by the wrong or no preventive maintenance. This mass of failures tends to mask those that are real defects or human error. Remember, a significant portion of the failures are a direct result of the preventive maintenance you are performing in the first place.

So what do you do? As an overview:

1. One of the first things to do is quickly rationalize and review your PM program to get rid of most of the "poor maintenance practice" failures. Shoot for getting you PM program up to snuff as quickly as you can.

2. This will now allow you to focus on RCFA when the human-related defects are more visible. Again, these failures can account for a significant number of your failures.
** As a side note, it seems to be assumed by many experts that an RCFA is not used to determine problems or issues with the maintenance process itself. However, this is exactly where the RCFA process can be used to determine what caused the failure, whether it was due to improper machinery operation (human error) or an unnecessary PM (ALSO human error!)**

3. Once this is done, you can again shift back to a reasonable analysis of your PM system and implement a workable RCM strategy that is based on inherent equipment reliability and its relationship to preventive maintenance. Otherwise, your RCM system will end up being based on human-error failures rather than equipment-related design or PM failures.

Give these steps a try if you are trying to figure out where to start your RCM implementation. Remember, defect elimination or RCA work accounts for twice the business benefit of implementing improved maintenance strategies by themselves.

Posted by kenreed at 01:29 PM | Comments (0) | TrackBack

May 17, 2006

Equipment Reliability Success Stories

This may be hard to believe, but the 2007 TapRooT(R) Summit planning is almost complete! Mark has put together an awesome program, with 10 tracks guaranteed to appeal to everyone. The Equipment Reliability and Maintenance Best Practices Track in particular is looking better than ever. We've got some great topics:

- How "Minor" Mechanical Failures Lead to Major Accidents
- 7-Step Troubleshooting Method for Electronic Troubleshooting
- Developing and Adding Custom Tables to Equifactor(R)
- Equipment Reliability Best Practices
- Lessons From the Crime Scene - Evidence Preservation for Accident Investigation
- Proactive Use of Equifactor(R) to Improve Equipment Reliability

I'd like you take special note of the Equipment Reliability Best Practices topic. I'd like to use your examples of how you have used Equifactor(R) to improve the reliability and operability of your equipment. How much time and effort have you saved using Equifactor(R) to quickly and efficiently troubleshoot your expensive machinery? Let me know if you are interested in presenting your company's experiences during the Summit. This is a great opportunity to "pat yourself on the back," while at the same time helping others with their problems.

Posted by kenreed at 09:00 AM | Comments (0) | TrackBack

May 10, 2006

Return on Investment: Lessons from MARCON 2006

I recently attended a maintenance reliability conference in Knoxville (MARCON 2006) sponsored by the University of Tennessee's Maintenance Reliability Center. I was struck by a common theme that came up over and over during the 3 days of the conference. Maintenance managers seem to be at odds with some of their more senior management concerning program implementation. You KNOW that the root cause analysis process you are trying to implement is the right way to go. You KNOW that the proactive analysis techniques you have recently learned will help both your maintenance department and your company. You KNOW that Equifactor(R) will significantly reduce your equipment troubleshooting time and MTTR. Then why is there so much resistance to these changes from your senior management?

In reality, it is usually our fault. We have not been very good at recognizing what our managers are focussed on. Remember, they have an obligation to the shareholders, and in most cases, those obligations revolve around PROFIT. So that is what WE must focus on when presenting our ideas to management. Your presentations should include how your changes will affect the bottom line. Some suggested ways to do this:

1. Calculate your return on investment. For equipment reliability improvements, this has historically been at least a 6:1 benefit ratio.
2. Show them how much production you are losing because of unscheduled equipment down-time. Include overtime calculations, production line start-up time, replacement parts cost, and waste generated during start-up.
3. Show how much more production can result from more efficient equipment operation. This unit per hour calculation addes up quickly.

These techniques must result in an actual dollar amount. This will quickly grab the attention of the manager who is responsible for approving your new ideas for using Equifactor(R) as part of your troubleshooting toolbox.

Posted by kenreed at 01:12 PM | Comments (0) | TrackBack

May 04, 2006

SKF Bearing Customer Support Engineers Attend 3-Day Equifactor(R) Class

Dscn1649

I'm in Gothenburg, Sweden teaching a 3-Day TapRooT(R)/Equifactor(R) Equipment Troubleshooting and Root Cause Failure Analysis Course that is co-sponsored by SKF. More than half of the students in the course are SKF manufacturing, customer service, and application design engineers - a really smart bunch of people. It's day 2 of a 3 day course.

It really is humbling teaching root cause analysis to people who know so much about bearings and equipment. But as usual, the TapRooT(R) System helps even experts do better root cause analysis.

The TapRooT(R) System "expands the universe" of potential ideas for correcting serious problems by giving experts additional ideas of potential root causes. And I see this happen in all kinds of industries all around the world.

I think that's why it is such a good job being a TapRooT(R) Instructor. TapRooT(R) comes through in every case to help experienced and inexperienced investigators find root causes that the previously would have overlooked.

Another strength of the TapRooT(R) System that comes across in this course is the ability to analyze the causes of equipment problems that are causes by human performance of the operators, installers, or maintainers. Often machinery experts know an amazing amount about the engineering of the machine. But they really appreciate the root cause analysis assistance they get analyzing human performance problems.

Tomorrow they get the Equifactor(R) part of the class and I really look forward to the "lightbubs turning on" as they realize how advanced root cause analysis and advanced equipment troubleshooting fit together.

Dscn1647

Posted by Mark at 09:56 AM | Comments (0)

May 03, 2006

Proactive Equifactor(R)

Maybe this is a use of Equifactor(R) that you've never thought of before...

You have been assigned to a team that has been tasked with installing a new air compressor that will be used as part of a new manufacturing process. Your specific job is to determine what preventive and predictive maintenance activities you will require for the new compressor. Where do you get this type of information?

You can start with the manufacturer's recommendations, but we all know this is a pretty coarse set of guidelines. Wouldn't it be nice if you knew how the compressor could possibly fail before it was installed? You could then design your PM and PdM requirements to look for these failure pathways.

As a TapRooT(R) user, you remember that Equifactor(R) is normally used to troubleshoot specific equipment failures and aid in your root cause analysis. But what if we use Equifactor(R) to list ALL POSSIBLE equipment faults, and then design our monitoring systems to look for these faults? In fact, you decide you can use Equifactor(R) to:
1. Determine what vibration monitoring is required to look for the listed symptoms
2. List what preventive maintenance may be required to prevent the failures from occurring
3. Update the operating procedures to keep the equipment operating conditions away from common failure modes
4. Design an operator training program to teach your people to look for specific incipient failure conditions
5. Include the gear in your company's lube oil analysis program
6. Conduct an RCM analysis to see what maintenance is really required, based on known Equifactor(R) failure modes

and you are just getting warmed up!

It is tough to set up your CBM system to try to minimize your maintenance workload if you don't know what maintenance is required in the first place. Take a look at how you would normally set up these types of programs for new equipment, and see how Equifactor(R) can be used as yet one more tool to make your life easier.

Posted by kenreed at 10:20 PM | Comments (1) | TrackBack

May 01, 2006

Monday Accident & Lessons Learned - Aviation Equipment Failures - Another Example of a Mechanical Failure Starting an Even Larger Failure

Attached (click on the continuation link below) is a report from an aviation failure on a small plane (not a jet).

This is another example of a small mechanical failure (a generator failure) that could have led to a larger failure (loss of the plane and loss of life of the crew and passengers).

What is the lesson I think you should learn?

That equipment reliability is a key part of system performance and SAFETY.

Safety professionals should help maintenance and equipment reliability folks find the root causes of equipment problems by using TapRooT(R). That's why safety folks (in addition to equipment reliability and maintenance professionals) should attend Equifactor(R) Training.

For general Equifactor(R) information see:

http://www.equifactor.com/

For 3-Day TapRooT(R)/Equifactor Equipment Troubleshooting and Root Cause Failure Analysis Training see:

http://www.taproot.com/courses.php?d=3

LEARN FROM THE EXPERIENCES OF OTHERS.... BEECH 100

Incident: Multiple Electrical Systems Failures

1. En-route from --- to --- (First Officer Flying Pilot) at 9000 ft msl, about 30 Miles north of ---. The Left Gen tripped off line, and momentarily came back on. The Volt Ammeter showed the Left Gen accepting a load. Then the process repeated, with the Left coming back on-line. For a third time, the Left Gen tripped and this time did not reset itself, nor would it reset manually.
2. During the next few minutes, the following were noted: a right Gen load of approximately .45, a left Gen load of zero, no ability to reset the left Gen, a red light in the gear handle, failure of the pressurization system, failure of the left fuel gauge, failure of the number one Comm radio, failure of the transponder’s mode C, a red “Computer” flag on the left ADI, failure of the #1 inverter, the inability to lower the flaps, and gear. Failure of the Primary Pitch trim. ATC (APP) was notified of the initial Communications Radio problems, and informed us of the Transponder Mode C failure. No assistance was requested at that time.
3. About 30 miles out of --- a call was made to the Company requesting the assistance of Maintenance. Due to the limitations of one Comm radio, numerous changes back and forth were required.
4. The aircraft was slowed to about 130 KIAS in the vicinity of ---, and the gear selected down. Nothing happened.
5. The Emergency Gear extension procedure checklist was reviewed. Then followed. During this period approximately sixty strokes of the (emergency) gear handle were applied. Seeing no green gear lights, and realizing that the electrical failure may have affected those lights, they were tested and failed to illuminate. We concluded they would not illuminate. This meant we could not stop pumping at the normal indication (three green) as taught in Ground School.
6. Flaps Approach were selected. The flaps did not move as observed from the cockpit.
7. The gear position was uncertain. Given that, and the multiple systems failures, an Emergency was declared with Tower.
8. A fly by of the tower was conducted. An aircraft at the runway hold line for Rwy 35. A regional airliner at the hold line suggested the gear looked normal. The Tower reported a “bowed appearance”. We proceeded to the East of the Airport and Called Company for about the fourth time.
9. The passengers received a preliminary briefing of the difficulties, and were told we were working closely with maintenance and ATC.
10. The emergency gear handle was pumped about twelve (12) more strokes, and resistance was met. Pumping ceased.
11. Another low pass down the runway was made. Tower reported gear appeared down. Company called again for further consultation with MX Personnel.
12. The passengers were briefed again. Brace positions were reviewed, and coats were passed forward to act as a cushion for passenger seated on the couch adjacent to the bulkhead.
13. An audible signal for assuming the brace position was agreed upon (the tone generated by cycling of the FSB sign).
14. Multiple systems were secured (lights – which had failed anyway, Bleed Air, and the checklist for an aborted landing with inability to stop on the runway was reviewed for action items. The F/O was briefed as to his actions, and as to those the Captain would accomplish.
15. A power on, zero flap landing with a Ref speed of 110 KIAS was made. The gear appeared normal, and the aircraft exited Rwy 35 at Delta Taxiway were it was shut down, and the passengers deplaned to a safe location at the edge of the taxiway.
#1 Generator bearing failure followed by 325 amp current limiter failure.

This would have been a challenging failure: 1) in low IFR, 2) at night, 3) in icing conditions, or 4) at altitude as oxygen masks are not connected when the depressurization starts, and the emergency descent calls for gear down (not possible) creating a longer time lapse in getting down.

Posted by Mark at 01:35 AM | Comments (0)