Consider the following scenario from the submarine USS HYPOTHETICAL. Following a meticulous valve line-up and a thorough pre-evolution brief, a team of hardened professionals commenced that most thrilling of housekeeping evolutions, the blowing of sanitary tanks overboard. Within seconds of actuating a valve that would admit pressurized air to the tanks, these steely-eyed operators were shocked by blood-curdling screams of terror from the middle level passageway. As jets of liquefied human filth erupted from every deck drain in the forward compartment, it became apparent that their valve line-up may not have been so meticulous after all.
Once the blow was secured and working parties were assembled to clean up the mess, some unlucky division officer (the DCA, in this case) was tasked with finding out what the hell happened, and developing a course of action to ensure that it never happened again. Just another day in the life; no submarine tour is complete without at least one eruption of poop-volcano. Fortunately, the answer was easily extracted from the evolution supervisor: the new guy, MMFN Operator, had fouled up the valve line-up. The solution: disqualify MMFN Operator from valve operations pending an exhaustive training upgrade.
DCA was thrilled to have found the answer, so that he could quickly get back to important work like turning little red dots into little green dots in CTQS. Much to his disappointment, the Engineer (his boss) did not accept the easy answer. Being a good nuke, Eng recognized that there were far too many questions still unanswered. Why had the Operator opened the wrong valves? Had he been trained correctly? If so, did he deviate from his training? Why? “Go back and try again,” Eng demanded. “Ask why five times.”
Eng was instinctively exercising the patterns of something called Root Cause Analysis, an industrial management technique the Navy has borrowed from the ultra-efficient assembly lines of Japan’s automotive industry. The point of Root Cause Analysis is to treat diseases rather than symptoms. Properly applied, it prevents little problems from turning into big problems.
A quick thumb-rule to condense a convoluted analytical process into something actually usable is the Toyota Motor Corporation’s original “5 Whys” method. Toyota recognized that while it’s possible to ask an infinite series of “whys” in analyzing a problem, root causes generally emerge around five levels deep. As an interesting aside, the “5 Whys” method was a core element of the Toyota Production System, which eventually evolved into Lean Manufacturing techniques. Lean was later combined with a process improvement discipline known as Six Sigma to become Lean Six Sigma, which is taught in every engineering management curriculum in the world. Neat.
So what happened in our example? DCA used the “5 whys” method:
Problem: We blew shit all over middle level.
Because the Operator opened the wrong valves.
Because he couldn’t read the valve labels. (Now we’re getting somewhere!)
Because they are temporary cardboard labels, and the writing has become illegible.
Because they’ve been temporary cardboard labels for as long as anyone can remember, and have been absorbing oil and water for years.
Because nobody inspects this space, and there is no process in place to get permanent labels on these valves.
It’s obvious that we could keep going like this forever, but by the fifth “why” we have unearthed the fundamental problems. We obviously need to look at our Zone Inspection program, and make sure we have an effective process to fix little deficiencies like valve labels. We also need to find out why the Operator was turning valves he couldn’t read. In this case, it turned out that the labels had been illegible for weeks, and the more experienced guys had been operating from memory—unacceptable. When a new guy came along, he tried to do what his role models had done, but didn’t have the experience to back it up, resulting in several new members in the Order of the Speckled Trout.
The point of Root Cause Analysis is to treat diseases rather than symptoms
It’s not necessary to identify a single root cause—in this case there were several points of failure, all of which had the potential to grow into worse problems if allowed to fester. By identifying and addressing them early, we can prevent them from turning into injuries or damaged equipment, or worse, a bad grade on a ship’s inspection. If we had accepted the easy answer—disqualify the new guy—we would just end up fighting these same issues again.
When It Goes Wrong
Anyone who has worked within 100 yards of a nuclear reactor has felt the asspain of Root Cause Analysis gone amok. Any good nuclear command will have a culture that is pathologically incapable of accepting “shit happens and I’ll try harder.” We have systems to prevent things from going wrong, so when things do go wrong it must indicate flaws in our system. From the standpoint of the greater organization, this is a great way to reinforce an adaptive learning culture with high engineering standards.
From the standpoint of the guy on the deckplate, it can be incredibly frustrating. Sometimes shit actually does just happen with no clear explanation. Sometimes an exhaustive new policy is not necessary to correct the problem, as appropriate mechanisms already exist—in other words, sometimes the solution really is to try harder.
The important thing is to fight the temptation to embrace a wrong answer in lieu of no answer. This is harder than it sounds at the end of a five-hour critique on Saturday afternoon. It could easily mean standing up to your bosses, because they may be desperate to close out the critique with a positive report to their bosses that yes, we have identified the problem and are implementing a solution. Embrace the truth always, even if the truth is that you will never know what happened.
“When fact, supposition and speculation, which have been used interchangeably, are properly separated, you will find that the known facts are so meager it is almost impossible to tell what was happening aboard Thresher.”
Adm. Hyman G. Rickover