Text generation is the core capability of the Freddy API. Send a prompt, get a response. This guide covers the fundamentals: input structure, output format, streaming, and key parameters.
curl https://api.aitronos.com/v1/model/response \
-H "X-API-Key: $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"organization_id": "org_your_org_id",
"model": "gpt-4o",
"inputs": [{"role": "user", "content": "Explain photosynthesis in one paragraph."}]
}'Response:
{
"id": "resp_abc123",
"model": "gpt-4o",
"output": [
{
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Photosynthesis is the process by which plants..."
}
]
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 45,
"total_tokens": 57
}
}| Role | Purpose |
|---|---|
user | The human's message or question |
assistant | A previous model response (for conversation context) |
system | Instructions that shape the model's behavior |
{
"inputs": [
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "What is an API?"},
{"role": "assistant", "content": "An API is an interface that lets software communicate."},
{"role": "user", "content": "Give me an example."}
]
}Use max_output_synapses to cap the response length:
{
"model": "gpt-4o",
"max_output_synapses": 256,
"inputs": [{"role": "user", "content": "Summarize the French Revolution."}]
}See Synapses and Neurons for how output length maps to cost.
For real-time output as the model generates it, use the streaming endpoint:
curl https://api.aitronos.com/v1/model/response/stream \
-H "X-API-Key: $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"organization_id": "org_your_org_id",
"model": "gpt-4o",
"inputs": [{"role": "user", "content": "Write a haiku."}]
}'The response is a Server-Sent Events (SSE) stream. Each chunk contains a delta field with the incremental text. See Streaming for full details.
To maintain conversation context across multiple requests, pass a thread ID:
{
"organization_id": "org_your_org_id",
"model": "gpt-4o",
"thread": "thrd_session_001",
"inputs": [{"role": "user", "content": "What did I ask you before?"}]
}If the thread doesn't exist, it's created automatically. See Threads for full documentation.
For production deployments, configure behavior via an Assistant instead of passing a system prompt in every request:
{
"organization_id": "org_your_org_id",
"assistant_id": "asst_abc123",
"inputs": [{"role": "user", "content": "Hello!"}]
}- Inputs and Outputs — Full input format reference
- Streaming — Real-time output
- Threads — Stateful conversations
- Assistants — Reusable model configurations
- Synapses and Neurons — Usage measurement
- Create a Response — Full API reference