Premature Design Is Not Design
Resist the temptation to guess, and let the code tell you what it needs.
In the 1974 paper with the somewhat click-baity title Structured Programming With go to Statements, Donald Knuth wrote:
premature optimization is the root of all evil (or at least most of it) in programming
Knuth’s point was that optimizing code normally makes that code trickier to work with; when you’re still in the exploring phase, trying to work out what your code should do, that’s the last thing you need. So, when you optimize (and you probably should at least think about optimization), you should do it only when (a) you find out what actually needs optimizing, and (b) when you’re unlikely to make major structural changes to the code base.
I’m coming to believe that what’s true for optimization is true pretty much universally: premature anything is likely to be a bad thing.
And you won’t find a better example of this than the way folks design code.
Premature Design is the Devil’s Work
I know this because I’m guilty of doing just that: bringing my vast arsenal of design principles to bear simply because I believed the Devil when they said “if you don’t make it perfect now, you’ll never come back and fix it later.”
That was old, sinful, Dave. Today’s Dave views the world differently. Today’s Dave believes that design is something that is iterated with the code, and is only introduced when needed.
Let’s look at a somewhat simplified example.
RBAC Rabbit Hole
A while back I needed a way to share information among all the people who work on writing and producing books at the Bookshelf. Everyone was already using our giant Rails app in other ways, so I decided to bolt on a kind of wiki-like thing.
Then a little red figure with a pitchfork, horns, and a tail popped into existence on my right shoulder. “How are you going to protect key information from being altered by novices,” asked the miniature Prince of Darkness. “You’re going to need hierarchies of user access rights.”
Old, sinful Dave listened. Old Scratch was right. We had different levels of user: systems admins, managers, editors, book authors, and third parties such as copy editors and layout folks. Managers should be allowed to alter pages they created, as well as pages created by any of the editors who reported to them, and the authors who worked under those editors. Editors could update their pages, along with those of their authors. Poor old authors could only edit their own work, while third parties had no ability to edit anything.
That was my high-level design for the authorization, but before I opened up an editor, I’d need to do some more work on the lower levels.
Clearly I’d need a database table mapping users to roles. But, thinking ahead, I’d likely need to have different kinds of authorization for different resources, so I shouldn’t hard wire in my Wiki. That called for another database table containing the resource types (it would start off with just one row). I’d need to create the code to manage the entries in those tables.
Then I’d need to be able to determine if user A could edit page X. There was clearly a recursive step in there: a manager could change pages owned by their editors, and the pages owned by that editor’s authors. So I’d need to design a recursive SQL query, which sent me off on a day’s worth of exploring CTEs and how I could express them in ActiveRecord.
I started off needing a place to store notes, but with the Devil’s help I was going to do it properly, and implement a fully-fledged Role-Based Access Control system.
That felt righteous, so off I went down the rabbit hole.
Today’s Dave Would Do It Differently
Today, when that little red fellow makes an appearance, I try to ignore what he says.
Instead, I wait until I need something before designing it.
I’d start my wiki with an in-memory hash that mapped page names to page content, accessed via a trivial Page
model. I’d get the wiki stuff working using this.
At that point, I’d make a decision. Clearly I need to store the data somewhere more permanent than RAM. And I clearly need to implement some kind of access control. Which should I do first?
On the basis of not doing things until I need to, I’d most likely choose the access control next, simply because it is likely to impact the data I need to store, and that data is easier to mess with while I’m just storing it in an in-memory hash.
I’d write the access control code as a placeholder method in my User model:
class User
def can_edit_wiki_page?(page)
true
end
end
Then I could update the editing code to call this method.
At this point, I wouldn’t be able to write any meaningful access-control tests, so I’d need a little more logic.
class User
def can_edit_wiki_page?(page)
page.owner == self
end
end
You’re right! This isn’t the recursive SQL query. It doesn’t use a role table or a resource table. But it’s enough design to see me through to the next phase, when I add some persistence.
And, you know what? I deployed it with just one minor change:
class User
def can_edit_wiki_page?(page)
self.admin? || page.owner == self
end
end
Turns out I didn’t need any of that fancy design. Everyone was happy with it being simple.
The Devil Likes DRY
Recently I’ve come across folks who don’t like the Don’t Repeat Yourself principle, which first appeared 25 years ago in The Pragmatic Programmer. These folks say that slavishly removing duplicated code from your apps can make the designs more complex, and can lead to architectures where unrelated things are somehow conflated.
As a response, I updated the DRY section of the 20th anniversary edition extensively. I tried to explain that DRY is not about code duplication; it’s about the representation of knowledge. If two pieces of code that represent different pieces of knowledge happen to be identical, they are not a DRY violation, and it would be a mistake to factor them into a single place. (Assuming, of course, that the two pieces of knowledge are not just two statements of a single fact).
Viva Las Vegas!
It can’t be a coincidence that I most recently saw the Devil at work when I when to the Sin City Ruby conference. Fito von Zastrow and Alan Ridlehoover gave a great talk with a fantastic example of the way the premature design, combined with an incomplete understanding of DRY, can lead to some crappy code.
Their example was a machine that dispenses cups of hot beverages. Initially, it just knew how to make a cup of coffee. (The code that follows is just a sketch of theirs.)
def dispense
heat_water
drop_cup
grind_coffee
force_water_through_filter
present_cup_to_customer
end
So far, so good. But then the requirement changes: the machine also has to dispense tea.
def dispense(options)
case options.drink
when :coffee
heat_water
drop_cup
grind_coffee
force_water_through_filter
present_cup_to_customer
when :tea
heat_water
drop_cup
measure_tea_on_to_filter
force_water_through_filter
present_cup_to_customer
else
...
end
end
Here’s where premature DRYing kicks in. We look at that code, and think “the two parts of the case statement are identical apart from the one line in the middle of each that adds either coffee or tea to the filter. So the DRY principle says we should refactor.
def dispense(options)
heat_water
drop_cup
case options.drink
when :coffee
grind_coffee
when :tea
measure_tea_on_to_filter
else
...
end
force_water_through_filter
present_cup_to_customer
end
The a new requirement comes in: we need to be able to add creamer to the coffee and milk to the tea. Obviously, the milk has to go in before the hot water, and the cream goes in only after the coffee has poured. We also need to add sweetener on request. Oh, and there the option for decaf coffee.
def dispense(options)
heat_water
drop_cup
case options.drink
when :coffee
if options.decaf?
grind_decaf_coffee
else
grind_coffee
end
when :tea
dispense_milk if options.white?
measure_tea_on_to_filter
else
...
end
force_water_through_filter
dispense_cream if options.drink == :coffee && options.white?
present_cup_to_customer
end
Then along comes the request to add hot chocolate to the machine, with the option for whipped cream. Then we get a request for low-fat milk…
We started out with the best of intentions, but our initial refactoring, done to satisfy DRY, is leading us down a decidedly ugly path. We’re left trying to salvage the mess by adding more and more code as we try to split up the dispense
method.
DRY is About Knowledge, Not Code
Our process broke down when we looked at this code:
def dispense(options)
case options.drink
when :coffee
heat_water
drop_cup
grind_coffee
force_water_through_filter
present_cup_to_customer
when :tea
heat_water
drop_cup
measure_tea_on_to_filter
force_water_through_filter
present_cup_to_customer
else
...
end
end
Our DRY instinct kicked in, and we immediately made the design decision to fix it. But we were rushing things, updating the design prematurely. I’ve come to realize that being hasty like this almost always leads to worse code in the long run. Today Dave would probably look at the code I’d written and add a single line:
# TODO: dry?
Sure, there might be a DRY violation here, but it is currently doing no harm. Remember, all design comes down to “how easy it it to change?” Even though our first thought might be that the duplication in this code might lead to problems, we have no current evidence that this is the case. In fact, if we stop to think about it, there isn’t actually a duplication here at all, because DRY is about knowledge, not code.
all design comes down to “how easy it it to change?”
The body of the first when
clause is the recipe for making a cup of coffee; the body of the second is the recipe for tea. Those are separate items of knowledge that just happen to share some steps. Thinking that there’s duplication here is like thinking that making an omelette and baking a cake are the same thing because they both involve cracking some eggs.
The Lazy Path
We feel justified to leave the code untouched, but then along comes the dairy requirement. Again, we don’t try to “do design” until we have a problem to solve, so we make the obvious changes:
def dispense(options)
case options.drink
when :coffee
heat_water
drop_cup
grind_coffee
force_water_through_filter
dispense_cream if options.white?
present_cup_to_customer
when :tea
heat_water
drop_cup
measure_tea_on_to_filter
dispense_milk if options.white?
force_water_through_filter
present_cup_to_customer
else
...
end
end
At this point, I’m starting to feel that we do have a design problem: this method now has four different paths through it. in it’s initial form it was possible to take it all in at a glance, but it now requires active reading. So let’s split it up.
def dispense_coffee(options)
heat_water
drop_cup
grind_coffee
force_water_through_filter
dispense_cream if options.white?
present_cup_to_customer
end
def dispense_tea(options)
heat_water
drop_cup
measure_tea_on_to_filter
dispense_milk if options.white?
force_water_through_filter
present_cup_to_customer
end
def dispense(options)
case options.drink
when :coffee then dispense_coffee(options)
when :tea then dispense_tea(options)
else
...
end
end
I’m liking this: each method is now a representation of a recipe. It’s easy to see what is going on, and how we’d add new recipes in the future. To me, this is DRY; each piece of knowledge is represented just once.
Could we do more? Of course. Right now each recipe starts with the same two actions and ends with the cup presentation. We could definitely extract that out:
def dispense_tea(options)
serve_hot_drink_in_cup do
dispense_milk if options.white?
measure_tea_on_to_filter
end
force_water_through_filter
end
But, again, I think that’s premature. We only have two recipes in our code, and two of anything is not a pattern.
Similarly, I look at the dispense
function. Whenever I see a case
statement used to dispatch to different behaviors, I want to replace it with a hash/dictionary lookup. But that’s also premature. Let’s make a # TODO
note in the code, just to keep an eye on it as things change in the future, and move on.
Evidence Based Design
One of the keys to keeping things simple is to avoid doing stuff until it actually needs to be done. There are no rules that say you must stop what you’re doing and refactor just because two lines of code are the same.
Instead, good design comes from a discovered need; from evidence that your code is not as easy to change as it could be.
Sometimes we can anticipate that need, but the world doesn’t end if instead we discover it as we go along.
Design is not rules that you follow. It isn’t in charge of what you do.
Instead, design is a tool you use to make your code easier to change.
My latest book, simplicity, is now available at The Pragmatic Bookshelf.
Thanks for this post, Dave! Couldn’t agree more!
My latest post was pretty much about a similar experience. And that led me to a reflection:
If you are writing a test and need to write a mock, you probably missed an opportunity to do incremental development. Because that mock should already exist as your first implementation.
I’m still refining that idea, but right now I think it makes a lot of sense.
Dave -- My take-away from this great article is that "...there isn’t actually a duplication here at all, because DRY is about knowledge, not code." In my own learning to refactor, I've (too frequently) boxed myself in by taking the obvious application of DRY: "Hey, here are some repeated lines of code." It's only when we start understanding the value of having a single source of truth for each decision point (etc.) in the code that we can proceed to refactor... the right things (which might actually be a few lines of duplicated code). You have elaborated and explained this eloquently.
I advise, when asked, that high-level/abstracted programming languages -- like Ruby code -- is really primarily written for communication between people (for example, my teammates, or "my future self"), not for the computer. Indeed, in compiled languages, the compiler may optimize code by "unrolling loops" and other object-code level which actually re-introduce repetitions (of sequences of machine instructions), things which can really speed-up execution. So there's a whole lot'a difference between crafting well-designed code in Ruby (or even Pascal or C) vs. what goes on at machine execution time. That's a lot of what his eminence Prof. Knuth was writing about in the days of "Structured Programming without goto Statements"... and those were the heady days when we all were just getting a glimmering of what compiler optimization algorithms could actually do for runtimes. "Optimization" was another (academic) word for "tight code", which we all aspired to especially in assembler language. It took a while -- a few years, maybe a decade -- to really sort this all out and understand the consequences as we do today.
Repetition often helps clarify things in a natural/people language like English -- the mere occurrence of lines of repeated code should not, by itself, set off DRY-alarm-bells -- that's much more of a code smell thing. If the repetition, in the form of a few lines of repeated code in a clear context, serves the understanding (by a person), and especially if it clarifies a carefully thought-out design, then the urge to refactoring-on-autopilot needs to be tamped down a bit. And yes, these higher-level design considerations and decisions do lead to code which is easier to explain, evaluate, debate and understand, and then later to change, modify, correct and evolve if/when appropriate. Thanks!