Getting Beyond DRY

One of the first rules that new software developers are taught is the ubiquitous "Don't Repeat Yourself" or DRY. At its most basic, it's usually applied as "If you find yourself copying and pasting a block of code, you should make that block of code into its own function".

The usual benefit that's cited for this rule is "Code Reuse". For years as I was learning to write software, I thought that the benefit of "code reuse" was convenience. "I need to perform a task in my code, et voila, I have a snippet of code already written that can do that!"

Unfortunately, the benefits described above are already provided by the copy/paste feature of your editor. It's not any faster to type out a one-line invocation of a single function than it is to copy and paste a 20 line block of code.

So why is code reuse good? I'm not denying that it is, but the reasoning is more nuanced that the phrase "code reuse" would imply. When I'm confronted with something that I've always taken for granted, or that I only have a superficial understanding of, I like to employ the "recursive why" method to try to get to a root cause or a deeper understanding. Just keep asking why, like a three-year-old, until you get to a satisfactory answer.

Q: Why is it good to extract repeated blocks of code to a reusable function?

A: So I don't have to copy and paste code around

Q: Ok, that's our superficial answer. Why is it bad to copy and paste code around?

A: Because then I have duplicated blocks of code throughout my app that are all doing the same thing

Q: Fine, but why are duplicated blocks of code bad?

A: Because if I want to make a change to one, I have to remember to change them all

Q: Now we're getting somewhere. Why is that bad?

A: Because anyone encountering this code would need to understand all of the low-level details of how it works. Because keeping a mental or documented list of all the places that this implementation would need to be changed is tedious and error-prone, especially since I might not be the one to make the changes in the future.

Q: And with that in mind, why is it a good idea to extract repeated blocks of code?

A: Because it hides the implementation details of how this particular block of code works, decreasing the mental burden on the developer who can simply use it and trust that it does work

DRY is only superficially about code reuse. DRY is really most developers' first encounter with designing their own abstractions. Creating reusable functions isn't about saving keystrokes for the developer. It's about hiding away the details that aren't necessary for the developer to understand, and exposing those details that are.

DRY is About Abstraction

When you see a block of repeated code and your DRY Alarm starts going off, you need to think not just about how to lift lines 47 through 64 into their own function and invoke it, but more importantly you need to think criticaly about what purpose that function serves. You're not just avoiding repeated code, you're packaging up a complex process into one or more smaller named units of work. To do that well, consider the following:

What is the responsibility of this function? Can you describe what it does in a single sentence without using the word "and"?
What details of its implementation can be completely hidden from a developer invoking this function?
What does it need to accept as arguments in order to do its job?
What should it return?

A few weeks ago a released a video showing the process of extracting and abstracting business logic out of the view layer of a small application. For the sake of time, I skipped over creating an abstraction for the display formatting of time data, but today I'm going to DRY up that bit of code with an eye towards abstraction, instead of just code reuse.

An Initial Approach

In the original code, there are two event handlers that have to deal with updating time data in the UI. First is the start handler, which creates an interval that increments a counter each second, then writes that count in a MM:SS format to the screen. Then the reset handler, which simply resets that counter to zero and explicitly writes 00:00 to the screen

// within the start event handler:
interval = setInterval(() => {
  timeElapsed++;

  // create 0-padded minute string
  const m = Math.floor(timeElapsed / 60);
  const mm = m < 10 ? `0${m}` : m;

  // create 0-padded second string
  const s = timeElapsed % 60;
  const ss = s < 10 ? `0${s}` : s;

  // combine them to `MM:SS` format
  const displayTime = `${mm}:${ss}`;

  // write that string to the UI
  timeDisplay.innerHTML = displayTime;
}, 1000);

// in the reset handler
timeElapsed = 0;
timeDisplay.innerHTML = '00:00';

At first glance, this might not look like repeated code, but conceptually we are doing the same progression of tasks twice: We are updating the timeElapsed variable as part of the internal state of our application, and we are writing a formatted representation of that value to the screen in the form of a MM:SS display. The fact that we can just write the value in the reset handler as 00:00 initially seems like a convenience, but it is only convenient because we are neglecting to account for the conceptual abstraction in play. If we decided to change the display format to include tenths of seconds, or hours, we would have to update that format in both places.

In other words, a developer has to read through all of our code and understand all of the implementation decisions in order to confidently use it.

So let's take a first stab at extracting this into something reusable. You might be tempted initially to do something like this:


// Extract the formatting and writing of time data into its own function
function updateTimeDisplay(timeDisplay, timeElapsed) {
    // create 0-padded minute string
  const m = Math.floor(timeElapsed / 60);
  const mm = m < 10 ? `0${m}` : m;

  // create 0-padded second string
  const s = timeElapsed % 60;
  const ss = s < 10 ? `0${s}` : s;

  // combine them to `MM:SS` format
  const displayTime = `${mm}:${ss}`;

  // write that string to the UI
  timeDisplay.innerHTML = displayTime;
}

//==========================
// within start handler 
interval = setInterval(() => {
  timeElapsed++;
  updateTimeDisplay(timeDisplay, timeElapsed)
}, 1000);

//==========================
// within reset handler 
timeElapsed = 0 
updateTimeDisplay(timeDisplay, timeElapsed)

This is a bit better because now the concern of how our time gets formatted is represented on one canonical location, but there are quite a few ways we could improve on this.

So What's Wrong With It?

Let's go back to the four questions I mentioned earlier in the article:

What is the responsibility of this function? Can you describe what it does in a single sentence without using the word "and"?

As it's currently written, I think the most succinct, yet complete, way that we could describe what this function does is

Converts a number of seconds timeElapsed into a MM:SS formatted string and writes that string to the provided HTML element timeDisplay.

The "and" in that sentence is critical. This function has two responsibilities, which is a code smell. We're mixing two concerns here: the formatting of the string and writing it to the UI. This is the kind of tightly-coupled implementation that makes this hard to extend and hard to test. It's only coincidental that in both of the places we format our timeElapsed value we immediately write it to the UI. There's no guarantee that will always be the case. What happens when you want to insert a formatted time string into a longer string and display that? Can't do it with the current implementation.

You can't possibly foresee all of the ways that your code will be used, but you can do your best to make its individual components as composable as possible. This is how you make your future life easier without having to predict the future.

The coupling of concerns also makes testing needlessly difficult. As it's currently designed, if you want to write a test for the logic of creating properly formatted time strings from a number, you need to run that test in some kind of rendered environment and assert on the contents of the HTML element.

What details of its implementation can be completely hidden from a developer invoking this function?

As it's currently written, we're not hiding much complexity from our teammates or our future selves. I would expect a new developer to have to read through the implementation of this function in order to know what's going on. Nothing in the name of the function indicates that we're formatting the input, so it's not clear to me as a new dev on the project whether I should pass the raw seconds or an already formatted string. Also, what format should or will the time display take? Gotta go read the implementation to find out.

What does it need to accept as arguments in order to do its job?

There's not much to argue with here. It takes the timeDisplay element and the timeElapsed number. These are the two pieces of information that the function needs to do its job, but when we consider this question in light of the previous one and the next one, we can do better.

What should it return?

As designed, I would say there's not anything that is worthwhile to return from this function. We could return the formatted string, but that makes the return value sort of an afterthought to the result of the function. By the time we've updated the UI with the formatted string, how likely is it that the calling code would want that string for some other use? Not very. And if it does, you're just muddying the responsibilities further by using this value that is returned incidentally from a larger operation.

An Improved Version

The answers to all of the questions above point to the fact that we've tried to put too much into our function. The real abstraction to be extracted here is only the formatting of the string:


// Extract ONLY the format operation into its own function
function formatSecondsToMMSS(timeElapsed) {
    // create 0-padded minute string
  const m = Math.floor(timeElapsed / 60);
  const mm = m < 10 ? `0${m}` : m;

  // create 0-padded second string
  const s = timeElapsed % 60;
  const ss = s < 10 ? `0${s}` : s;

  // combine them to `MM:SS` format
  return `${mm}:${ss}`;
}

//==========================
// within start handler 
interval = setInterval(() => {
  timeElapsed++;
  timeDisplay.innerHTML = formatSecondsToMMSS(timeElapsed)
}, 1000);

//==========================
// within reset handler 
timeElapsed = 0 
timeDisplay.innerHTML = formatSecondsToMMSS(timeElapsed)

By extracting only the formatting logic to its own function, we're able to isolate and easily compose individual concerns. This change gives satisfactory answers to all of our questions:

What is the responsibility of this function? Can you describe what it does in a single sentence without using the word "and"?

This function converts a number of seconds timeElapsed into a MM:SS formatted string.

Simple. To the point. No ands.

What details of its implementation can be completely hidden from a developer invoking this function?

Pretty much all of them. The new name of the function says exactly what it does: formatSecondsToMMSS. The implementation details of how we get from input to return value are not of any importance to the developer invoking the function.

What does it need to accept as arguments in order to do its job?

Just the seconds, which is hinted at by the name of the function.

What should it return?

The formatted string, which is again hinted at by the name of the function. This return value is the obvious culmination of the work performed by the function. This is now a pure function (one without side effects, and which returns deterministic results based only on its arguments), which means it can be easily invoked anywhere in the code that you have seconds and want a formatted string.

Where To Go From Here

This example is somewhat trivial, and like all simple illustrations glosses over some details that you'd have to address in a more realistic setting.

Context and Naming Conventions

For one, formatSecondsToMMSS is still probably not a great function name. If you were writing a real timekeeping application (eg. a billable hours tracker), you'd probably have at least a few different ways to format time which would be dependent on context. So in that case, assuming you were using this same sort of imperative vanilla JS architecture, you might have a function called formatTimeForTrackerDisplay, which would potentially call a more flexible formatTime function.


function formatTime(seconds, stringFormat) {
  // returns different formats based on what 
  // is passed as stringFormat
}

function formatTimeForTrackerDisplay(seconds) {
  return formatTime(seconds, "MM:SS")
}

This gives you a flexible, low-level method for manipulating time formats. But it also defines a canonical format for the timerDisplay, which may be written to from multiple places in the application.

Functions, Methods, Classes, Modules, Components,...

I'm using bare functions in this example because they're simple and illustrative. But all of these concerns are relevant when thinking about the abstractions you design for higher-level constructs as well. How can you design to be composable, require as little implicit context as possible, and make their implementation intuitive and obvious to the consuming developer?

Higher-level UI Libraries

In the solution above, you probably noticed that this line was repeated in both handlers:

timeDisplay.innerHTML = formatSecondsToMMSS(timeElapsed)

In our little toy application, there's no need to abstract this UI update further, because it's already a simple one-line invocation. In a real application, this is exposing an implementation detail that has the potential to become outdated. What if we were to change the markup of timeDisplay? We'd likely have to update all of the lines where we updated its value by setting its innerHTML.

Higher-level frameworks like React and Vue solve this problem for you by giving you a declarative syntax to write a component for the time display, which would likely receive the timeElapsed variable and update the display automatically when that value changes. This obviates the need for imperatively updating the display text in multiple places around your application. However, all the same concerns about writing good abstractions still apply, they now just apply to your component.

Final Thought

With all my talk of eliminating implicit context and making function names descriptive, you might think that I'm making the argument for "self-documenting" code. Far from it. There's no such thing. Software development is complex and difficult, and some level of documentation is always necessary in a team environment. By making thoughtful decisions about the design of your abstractions, you can make code more intuitive to use, more loosely coupled, and easier to test. All of these things lead to happier, healther codebases and more importantly, happier, healthier teammates.

Next Up:
Abstraction and Limiting Information Flow

Previously:
What is My Test Runner Actually Doing?

Useful articles delivered to your inbox. Learn to to think about software development like a professional.