Trouble is easily overcome before it starts.
Tao Te Ching – Chapter 64
Troubleshooters are naturally curious about prevention. With sweat on your brow and grease under your fingernails (or the digital equivalent thereof), your mind instinctively wonders: “Does this have to happen again?” Clients may like a quick fix and a return to business as usual, but they will appreciate you on a deeper level if you can show them how to stop a problem from recurring in the future. This is the realm of awkward hugs (in exchange for problems bested, I’ve gotten a few of these).
The script for a preventative remedy seems straightforward. First, identify the cause of a malfunction. But hold on, what does it really mean for something to be the cause of an incident? That might seem like a silly question, but I had an important realization about causes while writing my last article about selfies and showboating: both the physical processes that lead to an accident and the measures to prevent them are often, and confusingly, referred to as “causes.” As you’ll see, they are different concepts, and understanding the distinction is critical for both your repair and prevention efforts.
Ripped From the Headlines
To show you how the two meanings of the word “cause” are conflated in popular usage, I did a quick scan of the news. It didn’t take long for me to find examples of the confusion between accident processes and prevention methods. Here are a few quotes that I pulled from recent news stories. While reading, pay close attention to the word “caused” (and the synonymous phrase “resulted in”):
- “The Ionia County Sheriff’s office responded to an accident on Wednesday morning that was most likely caused by fog.” WILX News 10, July 25, 2018
- “Unenforced policies, lack of communication and ineffective traffic rules resulted in last year’s double-fatality accident at SSR Mining Inc.’s Marigold Mine, according to an investigations report from the Mine Safety and Health Administration.” Elko Daily Free Press,
- “A tug operator eventually pleaded guilty to involuntary manslaughter, acknowledging that the accident was caused largely due to his use of a cellphone and laptop while steering the barge.” USA Today, July 20, 2018
Visibility, communication, and the presence of distractions were obviously relevant circumstantial factors in these incidents. Perhaps improvements in these areas would have resulted in these accidents never happening. However, advanced students are probably wondering: has there ever been a catastrophe which involved instances of good visibility, clear communication, or focused operators? (Answer: yes.) And further, if their presence is optional, how can we point at something and say confidently, “We’ve found it—forever and always, this is the cause!”?
In the headlines above, many of the causes listed are statements of things missing or not done within the accident’s context: there was a lack of visibility, a failure to communicate, traffic rules weren’t followed, etc. Fog is simply water droplets suspended in the air. Even when I tried the foul-tasting fernet once, I never saw a weather phenomenon grab a steering wheel and lead a car into a ditch. Water vapor is not a supernatural force, so of course we know what is meant when someone says that “fog caused an accident”: namely, that it’s difficult to safely steer an automobile when you can’t see. This implies that something was missing that might have stopped the accident. What was it? Adequate visibility.
Here’s the problem: thinking of causes in this way involves a semantic sleight of hand, because these are actually pitches for prevention. When strongly worded like this, a “cause” appears to the reader as authoritative and definitive, perhaps even singular. The underlying message is: “Sleep well tonight reader, we’ve figured this one out.” Doing so, we’ve skipped over the accident processes and are boldly projecting out into the future, using subjective and value-laden opinions about the best way to prevent a certain type of accident. That’s because avoidance involves an economic calculation, the result of choosing among alternatives in the face of scarcity.
Causes: Inevitable and Finite
1 b: something that brings about an effect or a result
Let’s deal first with the sense that a cause is that “something,” referenced in the dictionary definition above, which results in a given effect. When investigating an accident or malfunction, we’re talking about that particular set of circumstances and actions under which an incident would be certain.
There’s a relentless march of logic at work here: a ship ramming into a reef at full speed will result in a dent (or worse). Contacting the rocks with sufficient force will cause the hull to tear. Applying the brakes with completely worn pads will score the rotors. Metal touching metal causes telltale lines on the surface of the rotors. Computer code that attempts to access restricted memory results in a segfault; a program that tries to interact with RAM that is off-limits causes the operating system to shut it down. Just like the sun rising in the east and setting in the west, these specific pre-conditions and actions produce inevitable results.
However, you can probably spot the troubleshooting dilemma that this type of surety presents: if something is inevitable, then nothing can stop it! Preventative measures can’t be marshaled at this stage of the analysis because we’re dealing with immutable truths of reality. You can’t change the physics of steel contacting rock, nor change the logic of adding 2 and 2 to get 4.
Even if you did have the ability to change the laws of physics, you wouldn’t necessarily want to be able to “troubleshoot” at this fundamental level of nature. Imagine being able to “fix” shipwrecks by being able to declare that steel can’t be damaged by rock. First, it would take magical powers and second, there would be unintended consequences: presumably, it would be impossible to stop any ship!
The ways that steel and rock interact are dependent upon physical laws, and their predictability is something you rely upon. That reality might not be favorable when the context is a ship’s hull and a reef. But, if you’re quarrying stone, you’d like your steel drill to actually have an effect on the rock. After all, if you’re going to sweat all day in the baking sun while swinging a hammer, it would be nice for it to matter. A solid object exerting a force upon another solid object can be a benefit or a hazard, depending on the context.
Finally, the causes which we are called upon to discover don’t necessarily have to originate in the fixed laws of the universe; they could simply be the rules under which you choose to work. For example, an operating system isn’t required to jealously guard protected memory. Some OS’s may, some may not. A protected memory scheme is a choice made by software architects and adopted voluntarily; it’s not a Law of the Universe you’ll find described in a book by Stephen Hawking.
If you’re a coder developing a product designed to work on a particular platform, the rules of the operating system are equivalent to a physical law like gravity. In the case of a software project, creating a new operating system just to run your code would probably be too costly and limit the appeal of your product. Therefore, the rules of the OS are something you’ll just have to accept.
The Infinite Varieties of Preventative Measures
The archetype of causality research was: where and how must I interfere in order to divert the course of events from the way it would go in the absence of my interference in a direction which better suits my wishes? In this sense man raises the question: who or what is at the bottom of things?
Ludwig von Mises, Human Action
Preventative measures can’t alter the laws of physics or logic; change can’t be effected at this level of reality. Therefore, the aim of prevention is to intervene before an inevitable process can get rolling. That is, we want to keep our ship’s hull from ever making contact with that rocky reef. We desire to have the brake pads be the only surface that touches the rotor on a set of brakes. We want our computer program to stay within the bounds of the memory allocated to it, thus ensuring that the operating system will let it run in peace.
Once we’ve identified the thing we want to avoid, prevention enters the realm of infinite possibilities. Take our ship and rocky reef, for example. There are countless measures we could take to prevent a shipwreck: always maintain a certain distance from the shore, install a sonar warning system that detects obstacles, vow to navigate only in good weather or during daylight, consistently maintain the steering and propulsion systems so they are always ready to make course corrections, create a detailed map of hazards, etc. We could even keep the ship safely anchored in the harbor at all times!
That’s a long list of options for this particular problem, and you can probably think of many more. But, you’re probably wondering if preventative measures are really spread over an infinite realm of possibilities. You might think that I’m exaggerating when I say that preventative measures are endless. However, merely by adding time as a variable to the mix, you can easily see that the opportunities to intervene are boundless.
Imagine a timeline, ending at the point where an accident was inevitable (t0), stretching back forever in time:
t-∞ … t-5 → t-4 → t-3 → t-2 → t-1 → t0
Each mark on our timeline represents the passage of one unit of time. I’ve used the NASA T-minus countdown style, which shows the amount of time remaining before an event happens. The units on your incident timeline could be seconds, hours, days, weeks, or even fortnights! The scale doesn’t matter, because any division of time we can pick is infinitely divisible. To prevent an accident, all you need to do is change the course of events before they become a certainty. By definition, that means you could have intervened anywhere on the timeline before the end. You could have chosen to interfere at t-1, t-5, t-20, t-35, t-35.1, t-35.2, t-35.3, etc. to infinity.
When you combine the when (the timing of an intervention) with the power of our imaginations to choose the how, it’s easy to see that the raw number of possibilities for preventing any accident are without end. That might make you feel better, knowing that you’ve got a lot of choice for prevention. It might also make you feel overwhelmed! Take comfort in the fact that, while preventative measures are indeed infinite, there will only be a small subset of options that simultaneously meet your particular situation’s constraints (practical, legal, economic, etc.).
A ship in harbor is safe, but that is not what ships are built for.
John A. Shedd
In the news snippets above, the reporters suggested that not driving in fog, better communication, and paying attention while towing a barge would have prevented those accidents. However, because options for prevention are without bound, you could easily have suggested alternative “causes” with headlines like:
- “Operating a car causes car accident.”
- “Humans present at a mine result in deaths.”
- “Two boats using the same river, at the same time, causes a collision.”
In other words, you could solve the problem of mine accidents killing people by—not having people near the mine at all (maybe one day robots will do this dangerous work…). Driving accidents could be prevented by—wait for it—not driving! Boats running into each other on a river could be solved by only allowing one boat at a time to use the waterway (that is, have other boats be absent as a matter of policy).
Within the range of alternative prevention methods, many will be impractical, unpopular, unenforceable, costly, unacceptable to the prevailing culture, etc. However, it doesn’t make them any less effective. As a troubleshooter, it’s important to recognize when value judgements have been injected into an analysis. If you understand that prevention methods are infinite, you can ask “Out of the many, why are we focusing on these few?”
When considering how to stop the next accident from occurring, what is promoted as acceptable will be intertwined with the current social, cultural, and political context. Given that setting, there will often be a debate among competing interests over whose vision for prevention will prevail. Nuclear power is a good example: if you believe that electricity from this source cannot be generated safely, then preventative methods will focus on prohibition (i.e., the absence of a nuclear power plant is the best way to ensure a meltdown never happens). Likewise, the owners, operators, employees, and consumers of nuclear power will each have their own interests and stances on prevention.
I re-watched Jaws recently, that venerable first blockbuster, and was reminded how it dramatizes the competition of interests with differing visions of prevention. Of course, I remembered there was a big shark involved somehow and the famous ad-lib “You’re gonna need a bigger boat.” What I had forgotten about was the tension between the stewards of the beach town over the correct means to stop the next shark attack. At first, Amity Island’s police chief Martin Brody and mayor Larry Vaughn have very different ideas, both about the nature of the threat and the remedy. In one heated argument, Mayor Vaughn defends the economic interests of the town, saying “Look, we depend on the summer people for our very lives…”; Brody counters with “Larry, we’re going to have to close the beaches!” You don’t have to go full-on X-Files or Alex Jones, but when a particular line of prevention is being promoted or closed off (particularly within a political context), it’s always a good idea to ask “Cui bono?”
When it comes to conceptualizing prevention, an easy place to start is with absence. A real world example of this principle in action are traffic patterns: whether it’s a road, airport, or harbor, it’s desirable to have vehicles travel in the same direction when in close proximity. After enough head-on collisions and rush hour jams, our distant ancestors figured out that a simple social convention could create an absence of vehicle-on-oncoming-vehicle hazards. The origin of one-way traffic patterns goes back to at least ancient Roman civilization (“…a key part of the traffic system at Pompeii is the use of one-way streets…”). I visited the Panama Canal, and it was even in use at this feat of “modern” engineering. Traffic along the waterway flows one-way at a time, because the canal has some narrow parts that make bi-directional movement dangerous (“ships move in one direction at a time due to safety constraints to cross the Culebra Cut.”).
My guess is that a uni-directional flow for movement is likely as old as humanity itself, arising organically from the problems of sharing a common route of travel. Even at Thanksgiving, we pass the food around the table in only one direction. (Oh, how it’s torture when the mashed potatoes start with the person next to you—in the opposite direction!) These schemes are so wide-spread because the principle is easy to understand: let protocol lead to the absence of hazards.
Positives and Negatives
When it comes to troubleshooting, a “cause” can have a positive or negative meaning. I’m not going to fight the common usage, but merely want you to think about both sides. The first sense features the word as a positive concept, that “something” that is present and directly brings about an effect. The other meaning is about what was missing: the policies or procedures that weren’t followed, the lack of situational awareness by an operator, the ignorance of a better way, etc.
Understanding the physical and logical processes that underlie a bad event is the key to stop it from recurring in the future. Once the precise mechanisms become clear (hull hitting rock, restricted memory being accessed, etc.), they become the focal point of your prevention efforts. You can then brainstorm the myriad ways to tweak reality before the inevitable happens, choosing both the means and the timing.
Another conclusion we can draw from a low-level analysis of how failures occur is that accidents are never accidental. Mishaps, large or small, emanate from predictable and ultimately knowable facts of reality; the only unexpected thing about them is the feeling of being caught unawares (I touched on this in Failure Most Foul: “Isn’t it interesting that we live in a world where it’s certain that every machine will eventually break down, and yet our experience of those failures is one of surprise?”).
Accidents involve long chains of causation, going as far back as you care to look. Understanding that logical progression and then deciding how to interfere are two related, but ultimately different, things. As your awareness of the individual links grows, so too will your ability to choose prevention methods that merit your precious time and scarce intervention resources.
- Header image: Railroad Wreck. 1922. [Photograph] Retrieved from the Library of Congress, https://www.loc.gov/item/2016833122/.