A breakdown is a deviation from a machine’s operational state; the process of fixing something involves moving from that broken state back to the ideal one. What is the ideal state? Well, you definitely know it by the desired outcome: to have the machine perform like it did before the failure, to carry on doing useful work. However, that’s a far cry from being able to describe what should be happening inside a system to realize that goal. Understanding a machine on this level makes comparing “broken” and “working” possible—the difference between the two will point the way to the appropriate remedy. Knowing how a system works is like having a map and compass for your repair: “broken” is where you are and “working” is where you want to go.
Where Am I?
Before we get into a discussion of moving towards a machine’s ideal state, we should dwell on the importance of knowing its current state. You may say it’s “broken,” but that’s not very specific. Sticking with our map analogy, point-to-point navigation requires knowing both your current location and the desired destination. A heading is relative to where you are right now. If you want to get to San Francisco and you are in Los Angeles, you go north; if you’re in Seattle and you want to get to Frisco, then you need to head south.
However, imagine being blindfolded and dropped off in the middle of a forest. From this unknown starting point, it’s not enough to know that your destination is San Francisco. Which way is it? You wouldn’t know, so your first order of business would be to answer the question “Where am I?” Don’t worry, unlike some fraternities, being blindfolded and finding your way home from a strange place isn’t required to join the Troubleshooters Guild. Unless you want to…it sounds like a great way to build some character, and everyone needs at least one good cocktail party story.
Discovering the current state of a system is a common thread that winds through many of the strategies in The Art Of Troubleshooting. Therefore, I’ll just briefly review some of the best tactics:
- Inspect: use your eyes, ears, and nose.
- Indicator Lights/Error Messages: when the machine is trying to tell you what’s wrong (its current status), please pay attention.
- Built-in Diagnostics: can you ask the machine how it’s day is going?
- Gauges/Probes: some machines come with gauges to tell you the state of various internal parameters. If you’ve embarked on your own data collection project, then maybe you’re added your own. Either way, check these out.
- Logs/Records: for digital devices, there is typically a place where the system will record its goings-on. For mechanical machines, the logs may be analog, but they’re just as useful. It would be a shame if someone had already noted the problem you’re experiencing and you wasted your time duplicating their efforts.
In addition to these ideas, there’s a very useful method from the medical profession called a “review of systems.” In a review of systems, a doctor goes from head to toe, asking you about each part of your body. Is your eyesight good? Do you have any trouble hearing? Any digestion problems? The goal is to solicit information about any recent changes or troubling conditions systematically. Going over the body in this way ensures that no important detail is missed.
In the same way, you can add a review of systems to your troubleshooting routine. If you’re thorough, it really is the ultimate way to know the “current state.” Your review can cover a specific machine or an entire factory. Whenever we had a customer complaint at Discovery Mining, I would divide my team up and do a brief review of systems. There were several major components powering our web site, each of which could be the source of trouble: the network, the database, our Internet connection, the web servers, etc. Whenever resources allowed, I always liked to check in with each of these parts and make sure they were operating within acceptable limits. When my obsession with data collection finally began to bear fruit, these review of systems became very easy for my team. Each subsystem had its own set of easily-accessible, continuously updated graphs that made a full inspection possible within just a few minutes.
Description And Operation
If you happen to be the designer of a system, then you should know how it’s supposed to work. However, in our world of mass-produced goods, it’s rare that a machine was crafted by your own two hands. But, you’re in luck because the manufacturer will often provide you with a metaphorical map and compass. Among these resources, you’ll find manuals, schematics, blueprints, service bulletins, troubleshooting trees, and how-tos. These materials should be consulted first: the complexity of some of today’s machines is astounding and so the knowledge of the people who were responsible for their design is indispensable. Years of careful thought may have gone into a product’s evolution, leading to counter-intuitive engineering tradeoffs that are difficult to understand without the same context as the original designer. Don’t be surprised by these things, when instead you could be informed of them in advance.
Some industries are better than others when it comes to providing the details of “how it should work.” I find the automotive industry to be particularly rich in documentation: I’ve looked at auto shop manuals and the level of description provided for even the smallest components can be impressive. Cars are expensive and important, so the professional troubleshooting industry that serves automobile owners is generally well equipped. In other contexts, manufacturer resources can be scant. Back when I was building computers, I purchased many parts that came with only a sentence or two regarding normal operation, usually on a blurry photocopy that was barely legible. In these cases, you may need to contact the manufacturer’s service department to get the necessary details to help you troubleshoot.
The last option, the hardest, is figuring out how something is supposed to work with just the machine as your source of information. Results will vary greatly, depending on how complicated the machine is and your level of expertise. Given enough resources (primarily time), every machine can be reverse engineered. Those engaged in historical restorations (e.g., classic cars) or those who are tasked with maintaining very old systems will eventually have to resort to the intelligent guesswork of trial and error. In “Duplicate The Problem,” I noted how the fix-it knowledge of mass-produced systems decays over time: documentation is lost, manufacturers go out of business, and human know-how withers as whole industries are upended in the creative destruction of capitalism. When that happens, you’ll need to fill in the gaps by yourself.
Finally, we should take a moment and recognize the invaluable role that education plays in understanding how “it’s supposed to work.” In my interviews with great troubleshooters, many of them cited reading manuals and taking classes as the foundation of their skills. This “general education” is an essential building block to understanding the systems under your care, even if most of what you learn isn’t applicable immediately. Can you really say you’re prepared to help without self-study or formal education? Rich Kral, a veteran HVAC repairman, would say otherwise:
“I think when you first get into a trade, you need to read the manual completely. After awhile, you can get to the point where you scan it… But I do believe that if you want to take care of a piece of machinery for a customer, you should know how it works. Don’t wait until you get a service call and someone wants their air conditioning fixed and you’re sitting up there [on the roof] reading the book.”
Expectations Versus Normal Operation
How something should work is not just about schematics and manuals, it can also be a matter of expectations. Whenever you interact with a new machine, you draw upon all your previous experiences, leveraging your assumptions for how it ought to function. These assumptions are very useful: after you learn how to drive a particular car, you can use that experience to help you drive any car. However, not all machines are created equal: how one works may not carry over to others in its class.
Sometimes, a repair is just figuring out the difference between someone’s expectations and how a machine was designed to work. A great example of this came up in one of the interviews I conducted for The Art Of Troubleshooting. Seasoned auto mechanic Dan McCormick related this encounter with a customer:
You have to know the system. We had a person come in and say, “I was driving down the road and the car starts chiming!” I had to think about that for a while…the car’s chiming?! Did you have the direction light on? He says, “No.” Are you sure you didn’t have the direction light on? “Well, I don’t think so.” That was one of the features of this particular car: if you left the direction light on for more than 3/10 of a mile, the computer would pick it up and start chiming to let you know you left it on. The customer went for a drive, intentionally leaving the direction light on, and it chimed the same way. He came back and admitted that’s what was happening! You see, a problem to him was actually just normal operation.
This story resonates with me: many times I’ve been called in to “fix” things that weren’t broken. The “repair” was simply to align someone’s expectations with the ways of the machine. Attentive listening, combined with system knowledge, can save a lot of time and prevent you from searching for problems that don’t exist. Once again, the mind is mightier than the wrench.
*** Questions? Comments? Have a related troubleshooting story that you’d like to share? Feel free to leave your feedback in the comments section below! ***