Building MockOpenAI: a weekend MVP story

2026-03-17 1058 words 5 minutes

Contents

Building MockOpenAI: a weekend MVP story

Last weekend I built and published a Ruby gem. From idea to published thing in about four days. Here’s how it went, including the part where I had to reconsider whether I’d built something useful at all.

Friday: 20 ideas, one bet

I’m between jobs right now. Good position to be in if you like building things, terrible position to be in if you like eating. So I’ve been running a little experiment: each weekend, pick one small idea and see if I can ship it.

Friday’s job was to generate and pick an idea. I sat down with an AI and brainstormed 20 candidates. Developer tools. Content products. Micro-SaaS. I narrowed it down to one: a local mock server for testing OpenAI-compatible APIs.

The pitch to myself was simple. I write a lot of Ruby. I write a lot of tests. Testing code that talks to LLMs is a bit annoying because, while the happy path is easy to mock, some of the failure modes and edge cases can be more of a pain. There had to be a better way.

By end of day Friday I had a repo, a gemspec, and a clear plan.

Saturday: build day

The core idea for MockOpenAI is that it’s a real HTTP server running on localhost, not a mock object or a stub. Your application code talks to it exactly the way it would talk to the LLM provider in production. You just point your client at http://localhost:4000 instead of the usaul API endpoint.

That distinction makes a difference, I think. With a real HTTP server you can test things that object-level mocking can’t touch: actual TCP timeouts, truncated streaming responses, retry headers. The kind of failure modes that bite you in production but never show up in your test suite because you stubbed them away.

The architecture is deliberately simple. A Rack server reads a shared JSON state file on every request. Your tests write rules to that file. The server is stateless. No client wrapping, no monkey-patching, no magic.

Here’s what using it looks like:

it "handles a rate limit", :mock_openai_rate_limit do
  expect { MyService.call_llm("Hello") }.to raise_error(RubyLLM::RateLimitError)
end

it "handles mixed outcomes", :mock_openai do
  MockOpenAI.set_responses([
    { match: "Step 1", response: "OK" },
    { match: "Step 2", failure_mode: :timeout },
    { match: "Step 3", response: "Done" }
  ])

  expect(MyService.step1).to eq("OK")
  expect { MyService.step2 }.to raise_error(Timeout::Error)
  expect(MyService.step3).to eq("Done")
end

The failure modes are:

Mode	What it does
`:timeout`	Sleeps then closes the connection without responding
`:rate_limit`	Returns HTTP 429 with an OpenAI-format error body
`:malformed_json`	Returns truncated JSON that causes a parse error in your client
`:internal_error`	Returns HTTP 500
`:truncated_stream`	Sends partial SSE chunks then closes the connection

You can also mix success and failure in a single test:

it "handles mixed outcomes", :mock_openai do
  MockOpenAI.set_responses([
    { match: "Step 1", response: "OK" },
    { match: "Step 2", failure_mode: :timeout },
    { match: "Step 3", response: "Done" }
  ])
end

Saturday was productive. By end of day I had all the core classes written TDD-style: Config, State, Matcher, ResponseBuilder, TemplateRenderer, all five failure mode classes. The code was written, the tests passed, and life was good.

Sunday: documentation and shipping

Sunday was docs day. I set up a Jekyll site and wrote the README. I added an Anthropic endpoint too, because my personal projects use both.

I also migrated the first personal project to use MockOpenAI. That went smoothly. The HTTP-level fidelity made a few tests a little more honest than they’d been with simple stubs at the client level.

Monday: the uncomfortable question

Monday I migrated a second personal project. This one used a helper module I’d written a while back to stub LLM calls. Just a few lines of code. It worked fine for that project.

I stared at that code for a while. Here it is:

module RubyLLMMocks
  def mock_ruby_llm_chat(content: nil, error: nil)
    if error
      allow(RubyLLM).to receive(:chat).and_raise(error)
    else
      mock_response = instance_double(
        RubyLLM::Message,
        content: content,
        inspect: "RubyLLM::Message(content: #{content.inspect})"
      )
      mock_chat = instance_double(RubyLLM::Chat, ask: mock_response)
      allow(RubyLLM).to receive(:chat).and_return(mock_chat)
    end
  end
end

# Example:
it "handles general ruby_llm errors gracefully" do
  error = RubyLLM::Error.new(nil, "Unexpected error")
  mock_ruby_llm_chat(error: error)

  generator = described_class.new(options)

  expect { generator.generate }
    .to output(/Error.*Unexpected error/m).to_stdout
    .and raise_error(SystemExit)
end

That’s it. Fifteen lines, no gem dependency, works perfectly for a project that uses RubyLLM as a wrapper. The error case is handled with and_raise. Clean.

So the question had to be asked: did I just build a solution in search of a problem?

After sitting with it, I don’t think so. But I did have to sharpen my thinking about when MockOpenAI actually earns its place versus when a helper method is the right call.

The short version: if you’re using a wrapper library like RubyLLM for all your LLM calls, and you only need happy-path responses and exception simulation in unit tests, the 15-line helper is probably the right answer. It’s less to maintain, has no extra dependencies, and does the job.

MockOpenAI is the right call when you need the actual HTTP layer in the picture. When you’re using the raw OpenAI or Anthropic client directly. When you’re running integration or system tests that make real HTTP connections. When you need to test what happens when TCP actually times out, or when a streaming response gets cut off halfway through, or when your retry logic processes a Retry-After header.

Those are real problems. They’re just not every project’s problems. I added a when not to use page to the docs to make the tradeoffs explicit.

What I’d do differently

One thing I’d change: I’d research the problem space a bit more to make sure I had a better understanding of problem scope and existing solution. (Especially ones I wrote myself!) The tool is solid, but I made some assumptions about the breadth of problems it would solve for. That’s a classic weekend MVP trap I suppose. You’re so focused on building that you skip the a bit of due diligence you think you don’t need.

The gem is published, the docs are live, and it works. The scope is narrower than I originally thought, but the use case is real. That feels like an honest result for a long weekend.

MockOpenAI is a Ruby gem for testing OpenAI-compatible and Anthropic APIs locally. References:

Source: github.com/grymoire7/mockopenai.
Docs: grymoire7.github.io/mockopenai.
Landing page: tracyatteberry.com/mockopenai.