Error Monads The Hard Way
I can't decide how to best express the equivalent of Haskell's Error monad and DO block in Ruby...
I’m currently writing code to import royalty data from our distributor. In comes in the form of 28 or so spreadsheets (because, of course it does). For each, the first step is to upload and ingest it. And, because I’m paranoid about getting royalties correct, I try to do a bunch of validation and reconciliation as I read each sheet in.
The typical flow is:
Extract the spreadsheet file from the incoming HTTP request.
Extract metadata from the file and use it to start populating an
Upload
record, where I keep track of each upload, regardless of its content.Parse the spreadsheet, extracting the data I need. Sometimes this is easy; it’s just a table. Other times the data is more like a statement or invoice, with a variable format, so I have an actual parser to understand it.
I then associate the data to do with an actual title with the title in our database.
Finally, I save all the data, typically across three or four tables, in a single transaction.
These steps are basically a pipeline, where each step feeds data to the next. However, errors can happen at each step, and I need to record them in the Upload record. When an error occurs, I stop processing that spreadsheet. This means I can’t use regular Ruby pipelines.
Each step returns a hash, either
{ status: :ok, data:
data_to_pass_to_next_step }
or
{ status: :error, message:
error_reason }
.
Here are the three alternative designs I played with.
Design 1: Linear Brute Force
Why overthink it? Just write the code:
result = excel_file_attached?(file)
if result[:status] == :error
record_error(upload, result[:message])
else
result = add_details_to_upload(result[:data], upload)
if result[:status] == :error
record_error(upload, result[:message])
else
result = parse_statement(result[:data])
if result[:status] == :error
record_error(upload, result[:message])
else
result = map_isbns_to_skus(result[:data])
if result[:status] == :error
record_error(result[:message], upload)
else
result = save_rows(upload, result[:data])
end
end
end
end
Code like this hurts my soul, so at the very least I’d want to flatten the nesting.
result = excel_file_attached?(file)
unless result[:status] == :error
result = add_details_to_upload(result[:data], upload)
end
unless result[:status] == :error
result = parse_statement(result[:data])
end
unless result[:status] == :error
result = map_isbns_to_skus(result[:data])
end
unless result[:status] == :error
result = save_rows(result[:data], upload)
end
if result[:status] == :error
record_error(upload, result[:message])
end
Better, but the code that does the actual work is buried inside all those unless
statements.
Design 2: The Mini State Machine
When I have a sequence of steps like this, I often use a trivial state machine:
step = 0
result = nil # for scoping
loop do
result = case step
when 0 then excel_file_attached?(file)
when 1 then add_details_to_upload(result[:data], upload)
when 2 then parse_statement(result[:data])
when 3 then map_isbns_to_skus(result[:data])
when 4 then save_rows(result[:data], upload)
when 5 then break
when :error
record_error(upload, result[:message])
break
end
if result[:status] == :ok
step += 1
else
step = :error
end
end
In general I like this code, but it seems a little over-the-top.
Design 3: Exceptions
The third thing I considered was having each of the processing functions raise an exception rather than return an error status. That cleans things up considerably:
begin
file
|> excel_file_attached?
|> add_details_to_upload(upload)
|> parse_statement
|> map_isbns_to_skus
|> save_rows(upload)
rescue UploadError => e
record_error(upload, e.message)
end
Clearly, this is the most direct of the three.
Design x: Metaprogram
I briefly considered reimplementing the pipeline operator |>
to exit early on seeing an error, but my days of that kind of tomfoolery are long past.
Design y: A Monad Library
I’m aware of the dry-monads library and the Railroad article. I considered using it, but adding an extra dependency for just a couple of lines of code was a nonstarter. Perhaps if I’d written the whole codebase from scratch using dry-monads, it would be an improvement, but this particular chunk of upload code is the first time I’ve felt the need for monadic behavior in this app.
What I Ended Up Doing
In the end it came down to design 2 or design 3. Of the two, I much prefer the exception-based approach: it seems on the surface to be a lot more direct and easy to comprehend.
You might be surprised, therefore, when I oped to use design 2, the state machine.
I admit it was a difficult decision, and as I write this I still have doubts. But I had two nagging thoughts about the exception/pipeline approach that swung it for me.
First, and weakest, is that I have a instinctive aversion to using exceptions as a flow-control mechanism. My rule of thumb is to use exceptions only for exceptional things—things that should never happen. A validation failure when reading external data doesn’t seem to count.
The second reason is what swung it for me. The pipeline is a great construct, but it ends up coupling functions together. When you write a() |> b()
, the return value of a
must be acceptable as the first parameter of b
.
For established code, which is unlikely to change, this seems like an acceptable tradeoff: the types are unlikely to change, and the pipeline is just plain more readable.
However, when I’m writing new code for a new application, I just know that I’ll be changing things around as I learn about the domain.
In the state machine approach, I’m currently emulating a pipeline by passing the result from one function to the next. But that isn’t set in stone. I can add new code before a particular call, or massage a return value. The design is more open.
The pipeline approach would make that trickier, because the fact that I’m using a language feature to chain functions together constrains my choices down the road. And as I just know I’ll end up having to change stuff, I chose to forgo the pipeline’s elegance.
But That’s Just Me
My reason for posting this is that I’m still not sure. I would really like to start some discussions in the comments. What would you do, and why? Are there other, better choices? Am I overthinking this?
Chat with you there.
If your procedure does not run in a very tight loop, Exceptions for flow control in Ruby are fine - because they replace the Either. If you rescue exceptions according to their class (using matching of some description) you get a very good reproduction of an Either, much better than the approach with tuples of `[:ok, :result] and `[:error, :message]`. One of the unpleasant aspects of those tuples is that nothing prevents you from having a `[:ok, :error_message]` or a `[:error, :result]` creep in (say hello to the Go error handling). So I would do it with exceptions and a reduce - if your pipeline needs to be composable. If it isn't, maybe a straight-ahead `input = do_thing(input)` per line would be even simpler.
```ruby
calls = [
-> (input) { do_thing(input) },
-> (input) { do_another_thing(input) },
-> (input) { do_yet_another_thing(input) },
]
result = calls.inject("hello!") do |input_from_previous, callable|
callable.(input_from_previous)
end
```
If your procedure does not run in a very tight loop, Exceptions for flow control in Ruby are fine - because they replace the Either. If you rescue exceptions according to their class (using matching of some description) you get a very good reproduction of an Either, much better than the approach with tuples of `[:ok, :result] and `[:error, :message]`. One of the unpleasant aspects of those tuples is that nothing prevents you from having a `[:ok, :error_message]` or a `[:error, :result]` creep in (say hello to the Go error handling). So I would do it with exceptions and a reduce:
```ruby
calls = [
-> (input) { do_thing(input) },
-> (input) { do_another_thing(input) },
-> (input) { do_yet_another_thing(input) },
]
result = calls.inject("hello!") do |input_from_previous, callable|
callable.(input_from_previous)
end
```