I have often stopped and wondered: “Could all the strategies in The Art Of Troubleshooting be condensed down to a single troubleshooting script?” Is there One Recipe to Rule Them All?
Many people I’ve talked to about troubleshooting have asked the same question. So, I went and figured it out. If you want the script, here it is:
The Universal Troubleshooting Recipe
- Find the problem.
- Fix it.
Okay…but it’s not really that useful, is it? Sorry, that’s the best I can do. I’ve tried to expand this beyond a 2-step process, but it gets too specific too quickly. Also, there are just too many “if this happens, then do this” clauses. For example, take “duplicating the problem”: pursuing this strategy is often a necessary step to discovering the cause of an issue. Or, not: some problems can be duplicated with 100% reliability and you won’t be that much closer to finding a solution (for example, a car that hasn’t run in 50 years will be very reliably broken). Given enough time and resources, it’s theoretically possible that all problems could be duplicated. However, it would be a waste of resources to single-mindedly pursue that strategy to the exclusion of others that could provide a quicker resolution.
Similarly, you might be able to “copy one that works,” but then again you might not have an extra one on hand and so that would be a dead end. Rebooting or restoring the default settings may magically resuscitate a machine in just a few minutes—I’ve witnessed this hundreds of times. But, I’ve also seen this tactic have no effect. Other problems may require a rigorous data collection program executed over weeks or months to identify and fully understand the underlying issue. Also, sitting above any given strategy is the possibility of not fixing something, which has its own considerations. Presumably, the tests to forgo troubleshooting would also need to be included in our Universal Troubleshooting Recipe. Finally, economics will have the last word: there may be paths that are very effective, but can’t be considered because of the cost.
I think you get the point: there’s not a single troubleshooting strategy that consistently produces results all the time in all situations. Therefore, I think that a universal theory of troubleshooting can’t be reduced to a single recipe on a card—it’s more like a whole box of recipe cards. When it comes to food, depending on what you’re craving and what ingredients are on hand, you try to choose the right recipe. If you have apples and want something sweet, apple pie is a good choice. If you’re making dinner and you have a hunk of beef and some vegetables, then a beef stew is a tasty match. Choosing the right recipe for your ingredients and occasion is the key to good cooking: you can’t make apple pie with beef, and a beef stew made from apples is going to taste like…something’s missing.
Within a discipline or industry, there will be opportunities to develop a single troubleshooting recipe. If you only work on a certain make and model of car every day, you’d inevitably hone your routine for maximum efficiency, including only the most effective strategies and placing them in a set order. Those refinements may only be a narrow subset of all the ideas presented in The Art Of Troubleshooting—the rest being superfluous. I guess the analogy to cooking would be a chef that only worked with a single main ingredient: if you only had apples and made the same apple pie every day, there wouldn’t be much use in learning how to filet a fish or debone a ham.
However, life isn’t that predictable: the problems you’ll need to troubleshoot will come in an infinite variety of forms and won’t care about your set routines. In other words, be prepared to handle not only apples, but also pears, bananas, fish, potatoes, beans, rice, and anything else that can be eaten! In that way, choosing a troubleshooting strategy that is complimentary to your problem is just like cooking: the problem is the ingredients and the strategy is the recipe. Depending on the context, a given machine failure may need duplication, or a change of sequence, or a reboot, or something to be plugged in, or something completely different. Mix and match as needed and voilà!
Until the particulars are known, you can’t say for sure which strategy is going to be best suited for a given situation. This is why I believe troubleshooting is an art: deciding which strategy is most appropriate will be a judgement call, aided by your experience and intuition. The key is familiarity and flexibility: intimately knowing the core troubleshooting strategies will allow you to select the one you think is going to have the biggest payoff. Add to that a flexible mindset which compels you to switch strategies if your first choice isn’t working. That is, if your vision of a pie doesn’t work out, maybe those apples could go in the stew after all. Bon appétit!
Fitting Into A Structured Approach
I believe that my methods, as formulated in the strategies, are the quickest way to a resolution for the vast majority of failures. However, there may be some situations that require a very rigorous approach to troubleshooting. I’ve worked on some extremely complex, intermittent failures that have benefited (or would have benefited!) from a formal process. For these rare cases, my strategy and “question-based” approach (as employed in my one-page Universal Troubleshooting Guide) may not be enough to satisfy your need for structure.
By now, I’ve seen a fair number of generic problem-solving processes. In addition to the reasons above, I want to address them because you might be curious, or your organization might promote their use. Here’s one such example of a “structured troubleshooting approach” from Amir Ranjbar’s book Troubleshooting and Maintaining Cisco IP Networks:
Ranjbar’s method has the following elements in this order:
Step 1. Defining the problem
Step 2. Gathering facts
Step 3. Analyzing information
Step 4. Eliminating possibilities
Step 5. Proposing a hypothesis
Step 6. Testing the hypothesis
Step 7. Solving the problem
(Troubleshooting and Maintaining Cisco IP Networks, pg. 41)
In the abstract, this is a great way to think about troubleshooting. For very tough problems you might need to be this rigorous, documenting your progress through these steps as you inch towards a solution. However, just like my 2-step process above, it’s a bit too general to be helpful as a quick troubleshooting recipe for an actual problem. Which facts do I gather? What possibilities do I eliminate? How do I choose a hypothesis to test? It’s going to take some thought to translate these steps into specific actions.
You may be wondering how my methods relate to a generic problem-solving process like the one above. The answer is that the strategies I present are a kind of shorthand: they include all of the above steps, often in combination, in an implicit way. For instance, let’s examine the “What’s changed?” strategy within the framework of Ranjbar’s process. The brief summary of this strategy is to find recent changes to a machine (or its environment), with the idea that one is causing a failure. Discovering the changes incorporates steps 2-3 (“gathering facts” and “analyzing information”). Choosing one, rolling it back, then observing the result combines steps 4-6 (“eliminating possibilities,” “proposing a hypothesis,” and “testing the hypothesis”). The power of a strategy like “What’s changed?” is that it takes the generic approach and fills in the blanks, swiftly nudging you in a direction that has been very profitable for others.
*** Questions? Comments? Have a related troubleshooting story that you’d like to share? Feel free to leave your feedback in the comments section below! ***
- Amir Ranjbar, Troubleshooting and Maintaining Cisco IP Networks (Indianapolis: Cisco Press, 2010), pgs. 32,41.