Reasoning models are a class of AI models that perform extended internal thinking before producing a response. Instead of answering immediately, they work through the problem step by step — evaluating possibilities, checking their logic, and refining their answer before presenting it.
When you send a request to a reasoning model, it goes through two phases:
- Reasoning phase — The model thinks internally. This produces reasoning content that may or may not be visible in the response.
- Response phase — The model presents its final answer based on its reasoning.
In the streaming response, reasoning is surfaced through special events:
event: response.reasoning.started
event: response.reasoning.delta (incremental reasoning text)
event: response.reasoning.completed| Model | Provider | Reasoning Style |
|---|---|---|
o3 | OpenAI | Extended deep reasoning |
o4-mini | OpenAI | Fast, efficient reasoning |
claude-opus-4 | Anthropic | Extended thinking mode |
gemini-2.5-pro | Built-in reasoning |
Check Available Models for the current list.
Use a reasoning model the same way as any other model — just specify the model ID:
curl https://api.aitronos.com/v1/model/response \
-H "X-API-Key: $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"organization_id": "org_your_org_id",
"model": "o3",
"inputs": [
{"role": "user", "content": "Prove that there are infinitely many prime numbers."}
]
}'When streaming, reasoning content is included in response.reasoning.delta events:
import requests
response = requests.post(
"https://api.aitronos.com/v1/model/response/stream",
headers={"X-API-Key": os.environ["FREDDY_API_KEY"]},
json={
"organization_id": "org_your_org_id",
"model": "o3",
"inputs": [{"role": "user", "content": "What is 17 * 23?"}],
},
stream=True,
)
for line in response.iter_lines():
if line.startswith(b"data: "):
import json
event = json.loads(line[6:])
if event.get("event") == "response.reasoning.delta":
print("Thinking:", event["delta"])
elif event.get("event") == "response.output_text.delta":
print("Answer:", event["delta"])Reasoning models consume additional synapses for the thinking process, even when reasoning content is not shown in the response. Longer, more complex problems require more reasoning steps and thus more synapses.
For cost-sensitive use cases, use reasoning models only when the problem genuinely benefits from extended thinking — complex math, multi-step logic, or ambiguous problems.
Good candidates:
- Mathematical proofs and complex calculations
- Multi-step logic puzzles
- Code with subtle bugs requiring careful analysis
- Tasks where accuracy is critical and cost is secondary
Not necessary for:
- Simple Q&A, factual lookups
- Creative writing and brainstorming
- Summarization and reformatting
- High-volume, low-complexity tasks
- Reasoning Best Practices — How to get the best results
- Available Models — Full model list
- Streaming — Streaming reasoning events
- Pricing — Cost implications