When 'just return JSON' isn't enough

2026-03-04 1157 words 6 minutes

/posts/structured_output/structured_output.png

Contents

When “just return JSON” isn’t enough

If you’re building a tool that calls an AI API and needs structured data back, at some point you’ll write a prompt that ends with something like: “Return a JSON array. No prose, no explanation, just the JSON.”

And most of the time, that works. But “most of the time” isn’t good enough when a parse error crashes your command mid-run.

I ran into this while building Jojo, a CLI tool that generates job application materials (tailored resumes, cover letters, and a custom landing page) using AI. Several of its commands need structured data back from the model: an array of annotations, an array of FAQ entries, an array of integer indices. Plain text won’t do; the output feeds directly into the next stage of the pipeline.

The problem with prompt instructions

Language models are trained to be helpful. Part of being helpful, from the model’s perspective, is adding context: explaining what it did, noting any assumptions, formatting output to be readable. A prompt instruction to “return only JSON” fights against that training.

The failure modes are predictable:

Here is the JSON array you requested:

```json
[{"question": "What is your experience with Ruby?", "answer": "..."}]
```

I've formatted this as valid JSON with the required fields.

The model followed the spirit of the instruction (it returned valid JSON) but wrapped it in markdown fences and a helpful introductory sentence. JSON.parse on that string raises a ParserError.

The original code defended against this with a quick fence-strip:

cleaned_json = json_string.strip
  .gsub(/\A```(?:json)?\n?/, "")
  .gsub(/\n?```\z/, "")
JSON.parse(cleaned_json, symbolize_names: true)

That handles the fence case. But what about the introductory sentence? What about a model that adds a trailing note after the JSON? The regex grows, cases multiply, and you end up with the same defensive code copy-pasted across three different generators.

JsonExtractor

Instead, we pull this logic into a single place and make it more robust by trying multiple strategies to just find the JSON in whatever the model returned.

module JsonExtractor
  def self.call(content, symbolize_names: false)
    return content if content.is_a?(Hash) || content.is_a?(Array)

    stripped = content.gsub(/\A```\w*\s*|\s*```\z/m, "").strip

    try_parse(content, symbolize_names: symbolize_names) ||
      try_parse(stripped, symbolize_names: symbolize_names) ||
      extract_first_structure(content, symbolize_names: symbolize_names) ||
      raise(JSON::ParserError, "No JSON object found in response")
  end
end

It tries three strategies in order:

Parse the raw response directly. If the model actually followed instructions, done.
Strip markdown fences and try again. Handles the most common failure mode.
Find the first { or [ in the response and parse the complete structure from there, using a depth-tracking parser to handle nested objects correctly. This handles the “Here is the JSON: […]” case regardless of what prose surrounds it.

Each generator now calls JsonExtractor.call(response) instead of rolling its own fence-stripping. One test suite covers all the edge cases.

The cleaner alternative

If you’re using RubyLLM, a Ruby library that wraps multiple AI providers behind a common interface, there’s a more architecturally elegant solution: structured output.

Instead of asking the model to “return JSON” and then parsing whatever it sends back, you define a schema and ask the model to conform to it:

class FaqSchema < RubyLLM::Schema
  array :faqs do
    string :question
    string :answer
  end
end

response = chat.with_schema(FaqSchema).ask(prompt)
# response.faqs is an array of objects with .question and .answer

The model is constrained at the API level, not the prompt level. You get back a parsed object that already matches your schema. No JSON.parse, no fence-stripping, no JsonExtractor. The schema also serves as documentation for what the command expects.

This is genuinely cleaner. So why not use it?

The provider support problem

Structured output is a feature that providers implement, and not all models support it. The major providers (Anthropic, OpenAI, Gemini) support it for their recent models. But older models, like GPT-3.5, Gemini 1.0, Codex, etc., don’t. Some providers, like Hugging Face, don’t support it at all.

Jojo is designed to work with whatever model and provider the user configures. It dynamically picks up provider credentials from environment variables and routes requests through RubyLLM. A user might be running Claude Haiku, GPT-4o, or a locally-hosted model through an OpenAI-compatible endpoint.

RubyLLM’s structured output interface is uniform across providers: chat.with_schema works the same way regardless of which model you’re talking to. But when the underlying model doesn’t support structured output, you don’t get graceful degradation. You get a provider error.

That’s the thing:

JsonExtractor works on any string. If the model returns prose with JSON buried in it, that’s handled. If the model returns clean JSON, that’s handled. It never throws a provider error because it never makes a structured output request.
Structured output fails loudly on unsupported models. There’s no fallback. You have to catch the error and decide what to do with it.

For a tool where wide model support is a goal, JsonExtractor is more defensive. The cost is that you’re still at the mercy of whatever the model decides to return, but in practice, every model returns something that contains valid JSON, even if it’s surrounded by other content. The extraction strategy covers the real-world failure modes.

Okay, you also give up strict JSON schema guarantees (if supported) but that’s a tradeoff I’m willing to make for broader compatibility.

When structured output is the right call

None of this is an argument against structured output in general. If you have good control over exactly which model and provider your code will run against, structured output is the better choice. The schema and be explicit, the contract is enforced at the API level, and you’re unlikely to have a parse error at runtime.

You might wonder whether a hybrid approach could get the best of both worlds: check whether the model supports structured output, use it if so, and fall back to JsonExtractor if not. RubyLLM does expose this information. Model objects carry a capabilities array that includes 'structured_output' when the feature is supported. So the check is technically feasible.

The problem is that it doesn’t simplify anything. JsonExtractor still has to exist for the fallback branch, so you haven’t reduced maintenance burden. And now you have two code paths returning different shapes: structured output returns a typed schema object, while JsonExtractor returns a Hash. Every caller has to handle both, or you need a normalization layer on top. You’ve added complexity without removing any.

The tradeoff is essentially: how much do you want to trade portability for guarantees? JsonExtractor is portable and handles the real-world messiness of model output. Structured output gives you stronger guarantees but requires a supported model.

For a tool like Jojo, where users configure their own API credentials and model choices, JsonExtractor keeps things working across the widest range of setups. The “clean” architecture turned out to be less pragmatic once you considered who was actually going to run it. That person is me, btw, and I want it to work with whatever model I’m testing with at the moment, without having to worry about structured output support.