Follow The Chain

Rusty chain
Any weak links?
(image: swiftsure525, license: CC BY-ND 2.0)

Many systems labor along a linear path and therefore lend themselves to a troubleshooting strategy I call: “follow the chain.” These “chained systems” are everywhere—most machines have at least one component that falls in this category. That’s because the essence of work, digital or analog, is transformative: you take an input, move it around, make additions or subtractions, and ultimately change it in a useful way. Because so many machines follow this A → B → C model, it’s only natural that there is a corresponding troubleshooting strategy that mirrors this form.

To prime your mind for this topic, let’s first look at some examples of chained systems:

1) The Krispy Kreme doughnut production line: fryer → sugar glazer → cooling tunnel → my stomach.

Krispy Kreme Doughnut Production Line
(image: Steve Jurvetson, license: CC BY 2.0)

2) A pathway in the electrical grid: generating plant → substation → your home.

Switchyard at the TVA's Wilson Dam
(image: The Library Of Congress, license: Public Domain)

3) The pipes and water heater that deliver hot water in your home: cold water supply → water heater → kitchen faucet.

OLYMPUS DIGITAL CAMERA
(image: johncarljohnson, license: CC BY 2.0)

Within transformational chains like these, the typical problem scenario is that either a station (place where work is done) or a conduit (pathway that moves material) will fail. Station failures may result in a “dumb passthrough” situation where material is still transmitted, albeit unchanged. For instance, if a water heater is malfunctioning, water may still get to the faucet, but it will be cold. Or, think about a network firewall that fails and stops filtering data, instead passing all Internet traffic along to your computer. A final tragic example: if that mesmerizing sugar glazer at your local Krispy Kreme runs out of glaze, the doughnuts will get to the end of the production line, though not with their addictive magic coating. Noooooooo!

Whatever the failure condition, you’ll first want to examine the result at the end of the chain and work your way back from there. As mentioned above, the two typical scenarios will be:

  • The output at the end of the chain will be flawed.
  • There will be no output.

Since either a station or a conduit can be a point of failure, you should look for ways to isolate and test them independently. You can also measure the flow through the system by installing probes at key points along the transformational chain. Isolation and testing will tell you what needs to be replaced, while taking measurements will tell you what needs to be isolated and tested. So, the “follow the chain” strategy boils down to these two tactics:

  1. Isolate and test each component of the chain.
  2. Measure the flow through the system by installing probes.

Point Break-down

Most audio/video setups are chain-like systems and provide a great illustration of how to test both stations and conduits. The main parts usually include source components (DVD players, cable TV decoders, video game systems, etc.) and intermediate devices (switchers/selectors, pre-amplifiers, amplifiers, etc.), whose signal eventually winds up at an output device (TV, speakers, etc.). In our model, these are the stations in the chain, because they transform the audio or video signal in some way. On the way to your eyes and ears, the way these signals move is via wires (speaker wire, RCA/coaxial/HDMI cables, etc.) or radio waves. These are the conduits in our model.

Let’s look at the “follow the chain” strategy in action. In this example, we’ll examine a typical home video setup with 2 source components. But first, let’s set the scene for our crisis: we’re unable to watch our favorite movie Point Break, starring the incomparable Keanu Reeves. Using the principles of measurement and isolation, we’ll take advantage of standardized connectors and cables to find the weak link somewhere in this chain: Point Break Pure Adrenaline Edition DVD → DVD Player → TV.

We first recognize that the DVD disc is itself part of the chain. Therefore, we take the Point Break Pure Adrenaline Edition DVD over to a friend’s house to see if it will work over there. Four hours later, we can confirm that it does. “Four hours later…?”, you ask. Listen, once you get rolling with a Keanu Reeves flick, you have no choice but to watch the movie the whole way through. Plus, there’s a ton of special features included in the Pure Adrenaline Edition, including 8 deleted scenes (!) and a featurette titled “It’s Make Or Break.”

Here’s what our setup consists of:

Follow The Chain, Audio-Video System #1
Diagram: a TV with two inputs, two source components, and perhaps the greatest movie of all time. A crisis builds as we realize that we can’t view this masterpiece. We’ll have to troubleshoot…
(image: The Art of Troubleshooting, license: © Jason Maxham)

In this system, the cable that connects the DVD player to the TV (Cable A, in red) is the same type as the cable used to connect the video game console to the TV (Cable B, in blue). Likewise, the two inputs on the TV are identical and will both accept this same type of cable. This is a crucial realization, because it will allow us to swap inputs and cables later on. Before we get any further, let’s make a list of all our components and what we know about them:

Part Status
Point Break Pure Adrenaline Edition DVD OK
TV Display ?
TV Input #1 ?
TV Input #2 ?
Cable A ?
Cable B ?
DVD Player ?
Video Game Console ?

Starting out, you can see there are a lot of question marks in our list. Besides the disc, we haven’t tested anything yet, so every part is suspect. By the way, this is a good way to begin troubleshooting any chained system: by not assuming anything!

Now that we understand the setup, we start at the end of the chain (the TV in this case). We power up the TV and the DVD player, select TV Input #1, insert the Point Break Adrenaline Edition DVD, press play, and…nothing. Having already eliminated the DVD disc as a suspect, the remaining candidates for the problem are: the TV display, TV Input #1, Cable A, or the DVD player. The problem must lie somewhere along this path! As I just mentioned, because the connectors, cables and TV inputs are interoperable, we can use the video game console and Cable B to test various theories about the source of the failure.

First, let’s quickly verify the that video game console is working (that’s the pathway of: Video Game Console → Cable B → TV Input #2 → TV). We want to be swapping around parts that have been verified to function, so it’s imperative we determine their status at the outset. We turn on the video game console, select TV Input #2, and everything works just fine. Great, now we’ve got a line of known working components that we can use to find the failed component in the DVD player chain. We’ll also update our list of working parts to chart our progress as we narrow down the list of suspects:

Part Status
Point Break Pure Adrenaline Edition DVD OK
TV Display OK
TV Input #1 ?
TV Input #2 OK
Cable A ?
Cable B OK
DVD Player ?
Video Game Console OK

Ultimately, we want to work through our entire list of question marks. All things being equal, it doesn’t matter where you start. Of course, things usually aren’t equal and so a good place to begin is with any “low-hanging fruit”: the easiest parts to test. It’s trivial to take Cable A from Input #1 and connect it to Input #2, so we’ll start there:

Follow The Chain, Audio-Video System #2
Diagram: testing the DVD player using Cable A and TV Input #2.
(image: The Art of Troubleshooting, license: © Jason Maxham)

When we hook up the DVD player using Cable A and Input #2, it still doesn’t work. You may think this is a setback, but this is good information. Remember from our earlier test that the TV display and Input #2 are known to work. That means we have isolated the problem to either the DVD player or Cable A: the problem must be in one of these two components.

Another consequence of this test is to de-prioritize the testing of Input #1. Since we haven’t tested it, we can’t say for sure that it works and so it must remain a “?” on our list. However, remember the statistics involving multiple failure scenarios. We know for sure there is a problem somewhere within the combination of the DVD player and Cable A. It’s possible, but highly unlikely that Input #1 is also failing at the same time.

Let’s specifically test the hypothesis that the DVD player is malfunctioning. To do this, we’ll hook up the DVD player using known working parts: Cable B and Input #2. The only question mark in this particular path is the DVD player:

Follow The Chain, Audio-Video System #3
Diagram: testing the DVD player using Cable B and TV Input #2.
(image: The Art of Troubleshooting, license: © Jason Maxham)

Keanu flickers to life! Because this configuration works, we can update our table and mark one more component as “working”: the DVD player.

Part Status
Point Break Pure Adrenaline Edition DVD OK
TV Display OK
TV Input #1 ?
TV Input #2 OK
Cable A ?
Cable B OK
DVD Player OK
Video Game Console OK

Now we’re down to just two suspects: Input #1 and Cable A. Since it means switching just one cable, let’s test Input #1 like this:

Follow The Chain, Audio-Video System #4
Diagram: testing Input #1 using the DVD player and Cable B.
(image: The Art of Troubleshooting, license: © Jason Maxham)

This configuration works too, and so we now know that Input #1 is okay (this was our hunch, but it’s good to know definitively). Let’s update our table one last time:

Part Status
Point Break Pure Adrenaline Edition DVD OK
TV Display OK
TV Input #1 OK
TV Input #2 OK
Cable A ?
Cable B OK
DVD Player OK
Video Game Console OK

We’ve done it, there’s no longer a mystery as to what’s preventing us from having an awesome Saturday night with our favorite movie. All the components in our system have been verified to work, with the exception of one. The culprit must be: Cable A.

Measurement And Probes

The other strategy for “follow the chain” troubleshooting is to put probes at key places in the transformational chain. Let’s imagine a hot water system in an apartment building. Water comes from the cold water supply, enters the water heater, is heated and then flows to the apartments. Pretty pedestrian stuff:

Follow The Chain - Water Heater
Diagram: water heater system.
(image: The Art of Troubleshooting, license: © Jason Maxham)

There are many ways this system can fail, but let’s examine two common scenarios:

  1. The heater malfunctions, and the cold water is simply passed through unchanged (a “dumb passthrough” scenario).
  2. A pipe bursts or leaks, preventing water from getting to an endpoint like a shower or kitchen sink (the “no output” situation).

Putting pressure and temperature monitors in place, we can better understand what is happening as water flows through the system:

Follow The Chain - Water Heater With Probes
Diagram: water heater system with temperature and pressure sensors.
(image: The Art of Troubleshooting, license: © Jason Maxham)

Now, we have insight into a whole variety of failure scenarios and can make very quick inferences as to their cause. Let’s look at a few examples of some possible readings from our gauges and the likely explanations:

Scenario
Input
Temp. Sensor
Output
Temp. Sensor
Input
Pressure Sensor
Output
Pressure Sensor
Explanation / Troubleshooting Ideas
#1
60° F
60° F
50 psi
50 psi
The water heater is just passing the water through without doing anything: the input temperature equals the output temperature. Is the heater turned on? What’s the thermostat set at?
#2
68° F
68° F
0 psi
0 psi
There’s no water pressure and the temperature is equal to room temperature. Has the water supply been shut off? Did you forget to pay your water bill?
#3
60° F
60° F
50 psi
30 psi
The loss of pressure means there’s a broken pipe or the tank is leaking. There may be flooding! Our gauges allow us pinpoint a leak somewhere between Pressure Sensor #1 and Pressure Sensor #2. That narrows things down considerably.
#4
60° F
120° F
50 psi
50 psi
Normal operation. The input pressure is equal to the output pressure. The water temperature goes from cold (60° F) on the input to hot (120° F) on the output. This is what a water heater is supposed to do!

Do you see how adding these 4 gauges has given us a much better understanding of how this system is functioning? They can show us exactly where and what to look for when investigating a problem. Four critical parts of our water flow chain are now being monitored and we’ll reap the benefits when we need to troubleshoot.

Use Your Eyes

Sensors and gauges are great, but there are many cases where the process chain can be visually inspected (remember the lessons of “Listen Up”: the importance of being engaged with the world around you and tuned in to your senses). Back to the Krispy Kreme production line: you wouldn’t need a “glazing sensor” to tell you that the donuts aren’t being glazed. You can see and taste it.

*** Questions? Comments? Have a related troubleshooting story that you’d like to share? Feel free to leave your feedback in the comments section below! ***

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: