Skip to content
Last updated

Text generation is the core capability of the Freddy API. Send a prompt, get a response. This guide covers the fundamentals: input structure, output format, streaming, and key parameters.

Basic Request

curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [{"role": "user", "content": "Explain photosynthesis in one paragraph."}]
 }'

Response:

{
 "id": "resp_abc123",
 "model": "gpt-4o",
 "output": [
 {
 "role": "assistant",
 "content": [
 {
 "type": "output_text",
 "text": "Photosynthesis is the process by which plants..."
 }
 ]
 }
 ],
 "usage": {
 "input_tokens": 12,
 "output_tokens": 45,
 "total_tokens": 57
 }
}

Input Roles

RolePurpose
userThe human's message or question
assistantA previous model response (for conversation context)
systemInstructions that shape the model's behavior
{
 "inputs": [
 {"role": "system", "content": "You are a concise technical writer."},
 {"role": "user", "content": "What is an API?"},
 {"role": "assistant", "content": "An API is an interface that lets software communicate."},
 {"role": "user", "content": "Give me an example."}
 ]
}

Controlling Output Length

Use max_output_synapses to cap the response length:

{
 "model": "gpt-4o",
 "max_output_synapses": 256,
 "inputs": [{"role": "user", "content": "Summarize the French Revolution."}]
}

See Synapses and Neurons for how output length maps to cost.

Streaming

For real-time output as the model generates it, use the streaming endpoint:

curl https://api.aitronos.com/v1/model/response/stream \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [{"role": "user", "content": "Write a haiku."}]
 }'

The response is a Server-Sent Events (SSE) stream. Each chunk contains a delta field with the incremental text. See Streaming for full details.

Conversation History with Threads

To maintain conversation context across multiple requests, pass a thread ID:

{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "thread": "thrd_session_001",
 "inputs": [{"role": "user", "content": "What did I ask you before?"}]
}

If the thread doesn't exist, it's created automatically. See Threads for full documentation.

Assistants

For production deployments, configure behavior via an Assistant instead of passing a system prompt in every request:

{
 "organization_id": "org_your_org_id",
 "assistant_id": "asst_abc123",
 "inputs": [{"role": "user", "content": "Hello!"}]
}