AI  /  Generative AI

Generative AI 26 guides · updated 2026

From transformer foundations to production RAG, tool-using agents, and the Model Context Protocol — the GenAI stack as it's actually being built in 2026.

Structured Output Generation

The biggest reliability challenge in production LLM applications isn’t getting the model to understand — it’s getting it to respond in a format your downstream code can actually parse. A JSON field with an extra comma. A schema with different key names than expected. A response wrapped in markdown that breaks your parser.

Structured output generation is about solving this at the infrastructure level, not hoping the model behaves.


The Problem with Unstructured Outputs

LLMs are probabilistic. Every token is sampled from a probability distribution. Without constraints, the model might:

Expected: {"name": "John", "age": 30, "city": "NYC"}
Actual might be:
{"name": "John", "age": "30", "city": "NYC"} ← age as string, not int
{"name": "John", "age": 30, city: "NYC"} ← missing quotes on key
{"Name": "John", "Age": 30, "City": "NYC"} ← wrong capitalization
Here's the extracted data: {"name": "John"...} ← preamble text
{"name": "John", "age": 30, "city": "NYC", "extra": "field"} ← hallucinated field

For one-off queries this might be fine. For a production pipeline processing 10,000 requests/day, you need consistency.


Approach 1: JSON Mode (API-Level)

The simplest solution for JSON output. Most major providers offer a dedicated mode:

# OpenAI JSON mode
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "Extract order details as JSON."},
{"role": "user", "content": "Customer John Smith ordered 3 blue t-shirts size L, order total $89.97, shipping to Chicago."}
]
)
# Guaranteed to return valid JSON — no parsing errors

Limitation: JSON mode guarantees syntactic validity but not adherence to your specific schema. You still might get extra fields or wrong types.


Approach 2: Structured Outputs with Schema (Strongest Guarantee)

OpenAI’s Structured Outputs (2024) and similar features in Anthropic’s API let you provide a JSON Schema. The model’s output is constrained to match it exactly.

from openai import OpenAI
import json
client = OpenAI()
schema = {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product": {"type": "string"},
"quantity": {"type": "integer"},
"size": {"type": "string", "enum": ["XS", "S", "M", "L", "XL"]}
},
"required": ["product", "quantity"]
}
},
"total": {"type": "number"},
"shipping_city": {"type": "string"}
},
"required": ["customer_name", "items", "total"]
}
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_schema", "json_schema": {"name": "order", "schema": schema, "strict": True}},
messages=[{"role": "user", "content": "Customer John Smith ordered 3 blue t-shirts size L, total $89.97, shipping Chicago."}]
)

With strict: True, OpenAI uses constrained decoding — the model’s token probabilities are masked at generation time to only allow tokens that could form a valid completion of the schema. This is mathematically guaranteed, not just best-effort.


Approach 3: Function / Tool Calling

Originally designed for tool use (letting models call APIs), function calling also works excellently for structured extraction. The model is asked to “call a function” and must provide arguments matching the function signature.

tools = [{
"type": "function",
"function": {
"name": "extract_order",
"description": "Extract order details from customer message",
"parameters": {
"type": "object",
"properties": {
"customer_name": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["customer_name", "total", "items"]
}
}
}]
response = client.chat.completions.create(
model="gpt-4o",
tools=tools,
tool_choice={"type": "function", "function": {"name": "extract_order"}},
messages=[{"role": "user", "content": "..."}]
)
# Extract the structured data
args = json.loads(response.choices[0].message.tool_calls[0].function.arguments)

This approach works particularly well for Anthropic’s Claude, which has excellent tool calling support.


Approach 4: Constrained Decoding Libraries

For local models (LLaMA, Mistral, etc.), grammar-based constrained decoding lets you specify exact output grammars at the inference level.

# Using Outlines library
import outlines
from pydantic import BaseModel
from typing import List
class OrderItem(BaseModel):
product: str
quantity: int
class Order(BaseModel):
customer_name: str
items: List[OrderItem]
total: float
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
generator = outlines.generate.json(model, Order)
order = generator(
"John Smith ordered 3 blue t-shirts and 1 black jacket. Total: $134.97"
)
# Returns a valid Order Pydantic object — guaranteed

Libraries in this space:


Generating SQL Reliably

SQL generation is a common use case with specific challenges: table and column names must match your schema, JOINs must be valid, and syntax varies by database.

Best practice pattern:

System prompt:
You generate valid PostgreSQL queries for our database.
Schema:
orders(id, customer_id, total, created_at, status)
customers(id, name, email, city, created_at)
order_items(id, order_id, product_name, quantity, price)
Rules:
- Use only columns that exist in the schema above
- Always include a LIMIT unless explicitly asked for all records
- Use parameterized style: $1, $2 for user-provided values
- Return ONLY the SQL query, nothing else
User: Show me all orders over $100 from customers in Chicago this month

For higher reliability, pair this with a query validator that checks the generated SQL against your actual schema before execution.


Output Parsing and Validation

Even with structured output APIs, defensive parsing is good practice:

from pydantic import BaseModel, ValidationError
import json
class ExtractedOrder(BaseModel):
customer_name: str
total: float
items: list[str]
def parse_order_response(raw_json: str) -> ExtractedOrder | None:
try:
data = json.loads(raw_json)
return ExtractedOrder(**data)
except json.JSONDecodeError:
# Log and retry with stricter prompt
return None
except ValidationError as e:
# Log schema mismatch details for prompt debugging
print(f"Schema mismatch: {e}")
return None

For high-volume production systems, track parse failure rates by prompt version. A rising parse failure rate is a signal that the model’s output distribution has shifted (possible after provider model updates).


Choosing the Right Approach

ScenarioBest Approach
OpenAI / GPT-4 production APIStructured Outputs with JSON Schema
Anthropic Claude APITool calling with schema
Local open-source modelOutlines constrained decoding
Quick prototypeJSON mode + Pydantic validation
SQL generationSchema-in-prompt + query validator
Complex multi-field extractionFunction calling with detailed schema

The general principle: move validation as close to generation as possible. Post-hoc parsing is fragile; constrained decoding at the token level is robust.