Your Five Step Debugging Guide: Part 2

This week, we're working through the 5-step debugging script I shared yesterday, detailing one step each day:

  1. Ensure you can clearly articulate the symptoms of the defect and reliably reproduce it.
  2. Define the boundaries in your code within which the defect could exist. (You are here)
  3. Form a hypotheses about the cause
  4. Test hypothesis
  5. Begin again at Step 1 until you can demonstrate unambiguously that the defect has been resolved

Today, we're on step 2.

2. Define the boundaries in your code within which the defect could exist

This is an iterative process, where you define a starting search area, then progressively narrow it down. There are basically two starting points for this step, depending on the scope of what you're debugging.

In the first scenario, you're following a rapid write-run-debug cycle with your code. You write a little bit of code, you run it to make sure it works, and you debug it if it doesn't. In that case, you can probably just look back at whatever you changed between right now and the last time the code worked, and that's the starting area within which the defect probably exists.

In second scenario, you're approaching a discovered defect from scratch. In this case, you want to use the description you generated in Step 1 to trace the path of execution through your code. To reuse an example from last week's lessons: consider a defect where you submit a form using an asynchronous request and the form fails to update as expected. Your starting area would begin with the onClick or onSubmit handler, through the HTTP request to the server, the request handler on the server side, the HTTP response, and back to whatever code processes that response on the client and renders the result.

Regardless of which scenario you start with, there are boundaries within that area that we can use to narrow it down.

This notion of boundaries is critical not just to debugging, but to building resilient software generally. A boundary defines the edges of a system. Across those boundaries a system ideally does not know or care how it is being utilized. A good example is an HTTP request. Your server doesn't know whether it's fulfilling a request from a browser, a mobile app, or wifi-enabled toaster. It just receives a request and issues an appropriate response. Likewise, your browser code doesn't care if the server it accesses is written in Golang, Node, or COBOL. It just expects that it can make a request and receive the appropriate response.

A boundary can be an entire server, a single UI component, or even a single function. A good boundary is one where data crosses deliberately and in a known form. HTTP requests and pure functions are great examples of boundaries.

Good boundaries make good software.

For our purposes, boundaries serve as points of truth in our fact-finding operation. We're looking for the point in our execution where reality diverges from expectation. By inspecting the values going across those boundaries--HTTP requests, arguments into a function, return values, whatever--you can iteratively narrow down the possible area in which that divergence occurs. The defect exists somewhere after the last boundary which matches expectation and the first boundary which does not.

In the form-submission example above, let's say I pop open my browser debugger and inspect the HTTP request. The request looks good, but the response does not. My defect must lie between these two points: ie. in the server-side request handler.

So I go look at the request handler. The parameters it's receiving look correct. Within the request handler is a function that queries the database, so I inspect the value returned from that function. That function is returning nothing. Aha! My first divergence from expectation. Now I suspect that the offending code lies somewhere between the beginning of my request handler and the end of my database function. So I inspect the arguments going into the database function. They look correct, and so I've narrowed my search area to the function itself. I continue refining this way as far as I can.

Notice we still are not forming any theories about the cause of the defect. The reason is that if you can narrow down the possible area for the defect, it becomes a lot easier to form good theories. We'll talk about it more tomorrow, but theories are time-consuming to prove or disprove, and that's exactly what you need to do when you're debugging. The longer you can put off your theorizing, the fewer theories you'll have to go through.

Tomorrow, we finally talk theories.

Read Step 3

Did you like this?

I send a daily email with tips and ideas like this one. Join the party!