Error Monads The Hard Way

I can't decide how to best express the equivalent of Haskell's Error monad and DO block in Ruby...

Jun 03, 2025

I’m currently writing code to import royalty data from our distributor. In comes in the form of 28 or so spreadsheets (because, of course it does). For each, the first step is to upload and ingest it. And, because I’m paranoid about getting royalties correct, I try to do a bunch of validation and reconciliation as I read each sheet in.

The typical flow is:

Extract the spreadsheet file from the incoming HTTP request.
Extract metadata from the file and use it to start populating an Upload record, where I keep track of each upload, regardless of its content.
Parse the spreadsheet, extracting the data I need. Sometimes this is easy; it’s just a table. Other times the data is more like a statement or invoice, with a variable format, so I have an actual parser to understand it.
I then associate the data to do with an actual title with the title in our database.
Finally, I save all the data, typically across three or four tables, in a single transaction.

These steps are basically a pipeline, where each step feeds data to the next. However, errors can happen at each step, and I need to record them in the Upload record. When an error occurs, I stop processing that spreadsheet. This means I can’t use regular Ruby pipelines.

Each step returns a hash, either

{ status: :ok, data: data_to_pass_to_next_step }

{ status: :error, message: error_reason }.

Here are the three alternative designs I played with.

Design 1: Linear Brute Force

Why overthink it? Just write the code:

result = excel_file_attached?(file)
if result[:status] == :error
    record_error(upload, result[:message])
else
    result = add_details_to_upload(result[:data], upload)
    if result[:status] == :error
        record_error(upload, result[:message])
    else
        result = parse_statement(result[:data])
        if result[:status] == :error
            record_error(upload, result[:message])
        else
            result = map_isbns_to_skus(result[:data])
            if result[:status] == :error
                record_error(result[:message], upload)
            else
                result = save_rows(upload, result[:data])
            end
        end
    end
end

Code like this hurts my soul, so at the very least I’d want to flatten the nesting.

result = excel_file_attached?(file)
unless result[:status] == :error
    result = add_details_to_upload(result[:data], upload)
end
    
unless result[:status] == :error
    result = parse_statement(result[:data])
end

unless result[:status] == :error
    result = map_isbns_to_skus(result[:data])
end

unless result[:status] == :error
    result = save_rows(result[:data], upload)
end

if result[:status] == :error
    record_error(upload, result[:message])
end

Better, but the code that does the actual work is buried inside all those unless statements.

Design 2: The Mini State Machine

When I have a sequence of steps like this, I often use a trivial state machine:

    step = 0
    result = nil     # for scoping

    loop do
      result = case step
               when 0 then  excel_file_attached?(file)
               when 1 then  add_details_to_upload(result[:data], upload)
               when 2 then  parse_statement(result[:data])
               when 3 then  map_isbns_to_skus(result[:data])
               when 4 then  save_rows(result[:data], upload)
               when 5 then  break

               when :error
                 record_error(upload, result[:message])
                 break
               end

      if result[:status] == :ok
        step += 1
      else
        step = :error
      end
    end

In general I like this code, but it seems a little over-the-top.

Design 3: Exceptions

The third thing I considered was having each of the processing functions raise an exception rather than return an error status. That cleans things up considerably:

begin
  file
  |> excel_file_attached?
  |> add_details_to_upload(upload)
  |> parse_statement
  |> map_isbns_to_skus
  |> save_rows(upload)
rescue UploadError => e
  record_error(upload, e.message)
end

Clearly, this is the most direct of the three.

Design x: Metaprogram

I briefly considered reimplementing the pipeline operator |> to exit early on seeing an error, but my days of that kind of tomfoolery are long past.

Design y: A Monad Library

I’m aware of the dry-monads library and the Railroad article. I considered using it, but adding an extra dependency for just a couple of lines of code was a nonstarter. Perhaps if I’d written the whole codebase from scratch using dry-monads, it would be an improvement, but this particular chunk of upload code is the first time I’ve felt the need for monadic behavior in this app.

What I Ended Up Doing

In the end it came down to design 2 or design 3. Of the two, I much prefer the exception-based approach: it seems on the surface to be a lot more direct and easy to comprehend.

You might be surprised, therefore, when I oped to use design 2, the state machine.

I admit it was a difficult decision, and as I write this I still have doubts. But I had two nagging thoughts about the exception/pipeline approach that swung it for me.

First, and weakest, is that I have a instinctive aversion to using exceptions as a flow-control mechanism. My rule of thumb is to use exceptions only for exceptional things—things that should never happen. A validation failure when reading external data doesn’t seem to count.

The second reason is what swung it for me. The pipeline is a great construct, but it ends up coupling functions together. When you write a() |> b(), the return value of a must be acceptable as the first parameter of b.

For established code, which is unlikely to change, this seems like an acceptable tradeoff: the types are unlikely to change, and the pipeline is just plain more readable.

However, when I’m writing new code for a new application, I just know that I’ll be changing things around as I learn about the domain.

In the state machine approach, I’m currently emulating a pipeline by passing the result from one function to the next. But that isn’t set in stone. I can add new code before a particular call, or massage a return value. The design is more open.

The pipeline approach would make that trickier, because the fact that I’m using a language feature to chain functions together constrains my choices down the road. And as I just know I’ll end up having to change stuff, I chose to forgo the pipeline’s elegance.

But That’s Just Me

My reason for posting this is that I’m still not sure. I would really like to start some discussions in the comments. What would you do, and why? Are there other, better choices? Am I overthinking this?

Chat with you there.