Skip to content
Last updated

Reasoning models are a class of AI models that perform extended internal thinking before producing a response. Instead of answering immediately, they work through the problem step by step — evaluating possibilities, checking their logic, and refining their answer before presenting it.

How Reasoning Works

When you send a request to a reasoning model, it goes through two phases:

  1. Reasoning phase — The model thinks internally. This produces reasoning content that may or may not be visible in the response.
  2. Response phase — The model presents its final answer based on its reasoning.

In the streaming response, reasoning is surfaced through special events:

event: response.reasoning.started
event: response.reasoning.delta (incremental reasoning text)
event: response.reasoning.completed

Supported Reasoning Models

ModelProviderReasoning Style
o3OpenAIExtended deep reasoning
o4-miniOpenAIFast, efficient reasoning
claude-opus-4AnthropicExtended thinking mode
gemini-2.5-proGoogleBuilt-in reasoning

Check Available Models for the current list.

Making a Request

Use a reasoning model the same way as any other model — just specify the model ID:

curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "organization_id": "org_your_org_id",
 "model": "o3",
 "inputs": [
 {"role": "user", "content": "Prove that there are infinitely many prime numbers."}
 ]
 }'

Accessing Reasoning Content

When streaming, reasoning content is included in response.reasoning.delta events:

import requests

response = requests.post(
 "https://api.aitronos.com/v1/model/response/stream",
 headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
 json={
 "organization_id": "org_your_org_id",
 "model": "o3",
 "inputs": [{"role": "user", "content": "What is 17 * 23?"}],
 },
 stream=True,
)

for line in response.iter_lines():
 if line.startswith(b"data: "):
 import json
 event = json.loads(line[6:])
 if event.get("event") == "response.reasoning.delta":
 print("Thinking:", event["delta"])
 elif event.get("event") == "response.output_text.delta":
 print("Answer:", event["delta"])

Cost Considerations

Reasoning models consume additional synapses for the thinking process, even when reasoning content is not shown in the response. Longer, more complex problems require more reasoning steps and thus more synapses.

For cost-sensitive use cases, use reasoning models only when the problem genuinely benefits from extended thinking — complex math, multi-step logic, or ambiguous problems.

When to Use Reasoning Models

Good candidates:

  • Mathematical proofs and complex calculations
  • Multi-step logic puzzles
  • Code with subtle bugs requiring careful analysis
  • Tasks where accuracy is critical and cost is secondary

Not necessary for:

  • Simple Q&A, factual lookups
  • Creative writing and brainstorming
  • Summarization and reformatting
  • High-volume, low-complexity tasks