Structured Output Generation
The biggest reliability challenge in production LLM applications isn’t getting the model to understand — it’s getting it to respond in a format your downstream code can actually parse. A JSON field with an extra comma. A schema with different key names than expected. A response wrapped in markdown that breaks your parser.
Structured output generation is about solving this at the infrastructure level, not hoping the model behaves.
The Problem with Unstructured Outputs
LLMs are probabilistic. Every token is sampled from a probability distribution. Without constraints, the model might:
Expected: {"name": "John", "age": 30, "city": "NYC"}
Actual might be:{"name": "John", "age": "30", "city": "NYC"} ← age as string, not int{"name": "John", "age": 30, city: "NYC"} ← missing quotes on key{"Name": "John", "Age": 30, "City": "NYC"} ← wrong capitalizationHere's the extracted data: {"name": "John"...} ← preamble text{"name": "John", "age": 30, "city": "NYC", "extra": "field"} ← hallucinated fieldFor one-off queries this might be fine. For a production pipeline processing 10,000 requests/day, you need consistency.
Approach 1: JSON Mode (API-Level)
The simplest solution for JSON output. Most major providers offer a dedicated mode:
# OpenAI JSON moderesponse = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[ {"role": "system", "content": "Extract order details as JSON."}, {"role": "user", "content": "Customer John Smith ordered 3 blue t-shirts size L, order total $89.97, shipping to Chicago."} ])# Guaranteed to return valid JSON — no parsing errorsLimitation: JSON mode guarantees syntactic validity but not adherence to your specific schema. You still might get extra fields or wrong types.
Approach 2: Structured Outputs with Schema (Strongest Guarantee)
OpenAI’s Structured Outputs (2024) and similar features in Anthropic’s API let you provide a JSON Schema. The model’s output is constrained to match it exactly.
from openai import OpenAIimport json
client = OpenAI()
schema = { "type": "object", "properties": { "customer_name": {"type": "string"}, "items": { "type": "array", "items": { "type": "object", "properties": { "product": {"type": "string"}, "quantity": {"type": "integer"}, "size": {"type": "string", "enum": ["XS", "S", "M", "L", "XL"]} }, "required": ["product", "quantity"] } }, "total": {"type": "number"}, "shipping_city": {"type": "string"} }, "required": ["customer_name", "items", "total"]}
response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_schema", "json_schema": {"name": "order", "schema": schema, "strict": True}}, messages=[{"role": "user", "content": "Customer John Smith ordered 3 blue t-shirts size L, total $89.97, shipping Chicago."}])With strict: True, OpenAI uses constrained decoding — the model’s token probabilities are masked at generation time to only allow tokens that could form a valid completion of the schema. This is mathematically guaranteed, not just best-effort.
Approach 3: Function / Tool Calling
Originally designed for tool use (letting models call APIs), function calling also works excellently for structured extraction. The model is asked to “call a function” and must provide arguments matching the function signature.
tools = [{ "type": "function", "function": { "name": "extract_order", "description": "Extract order details from customer message", "parameters": { "type": "object", "properties": { "customer_name": {"type": "string"}, "total": {"type": "number"}, "items": { "type": "array", "items": {"type": "string"} } }, "required": ["customer_name", "total", "items"] } }}]
response = client.chat.completions.create( model="gpt-4o", tools=tools, tool_choice={"type": "function", "function": {"name": "extract_order"}}, messages=[{"role": "user", "content": "..."}])
# Extract the structured dataargs = json.loads(response.choices[0].message.tool_calls[0].function.arguments)This approach works particularly well for Anthropic’s Claude, which has excellent tool calling support.
Approach 4: Constrained Decoding Libraries
For local models (LLaMA, Mistral, etc.), grammar-based constrained decoding lets you specify exact output grammars at the inference level.
# Using Outlines libraryimport outlinesfrom pydantic import BaseModelfrom typing import List
class OrderItem(BaseModel): product: str quantity: int
class Order(BaseModel): customer_name: str items: List[OrderItem] total: float
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")generator = outlines.generate.json(model, Order)
order = generator( "John Smith ordered 3 blue t-shirts and 1 black jacket. Total: $134.97")# Returns a valid Order Pydantic object — guaranteedLibraries in this space:
- Outlines: Grammar-constrained decoding, Pydantic integration
- Guidance: Microsoft’s template-based structured generation
- LMQL: Query language for constrained LLM output
- llama.cpp grammar: GBNF grammar files for C++ inference
Generating SQL Reliably
SQL generation is a common use case with specific challenges: table and column names must match your schema, JOINs must be valid, and syntax varies by database.
Best practice pattern:
System prompt:You generate valid PostgreSQL queries for our database.Schema: orders(id, customer_id, total, created_at, status) customers(id, name, email, city, created_at) order_items(id, order_id, product_name, quantity, price)
Rules:- Use only columns that exist in the schema above- Always include a LIMIT unless explicitly asked for all records- Use parameterized style: $1, $2 for user-provided values- Return ONLY the SQL query, nothing else
User: Show me all orders over $100 from customers in Chicago this monthFor higher reliability, pair this with a query validator that checks the generated SQL against your actual schema before execution.
Output Parsing and Validation
Even with structured output APIs, defensive parsing is good practice:
from pydantic import BaseModel, ValidationErrorimport json
class ExtractedOrder(BaseModel): customer_name: str total: float items: list[str]
def parse_order_response(raw_json: str) -> ExtractedOrder | None: try: data = json.loads(raw_json) return ExtractedOrder(**data) except json.JSONDecodeError: # Log and retry with stricter prompt return None except ValidationError as e: # Log schema mismatch details for prompt debugging print(f"Schema mismatch: {e}") return NoneFor high-volume production systems, track parse failure rates by prompt version. A rising parse failure rate is a signal that the model’s output distribution has shifted (possible after provider model updates).
Choosing the Right Approach
| Scenario | Best Approach |
|---|---|
| OpenAI / GPT-4 production API | Structured Outputs with JSON Schema |
| Anthropic Claude API | Tool calling with schema |
| Local open-source model | Outlines constrained decoding |
| Quick prototype | JSON mode + Pydantic validation |
| SQL generation | Schema-in-prompt + query validator |
| Complex multi-field extraction | Function calling with detailed schema |
The general principle: move validation as close to generation as possible. Post-hoc parsing is fragile; constrained decoding at the token level is robust.