# Text Generation

Text generation is the core capability of the Freddy API. Send a prompt, get a response. This guide covers the fundamentals: input structure, output format, streaming, and key parameters.

## Basic Request


```bash
curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [{"role": "user", "content": "Explain photosynthesis in one paragraph."}]
 }'
```

**Response:**


```json
{
 "id": "resp_abc123",
 "model": "gpt-4o",
 "output": [
 {
 "role": "assistant",
 "content": [
 {
 "type": "output_text",
 "text": "Photosynthesis is the process by which plants..."
 }
 ]
 }
 ],
 "usage": {
 "input_tokens": 12,
 "output_tokens": 45,
 "total_tokens": 57
 }
}
```

## Input Roles

| Role | Purpose |
|  --- | --- |
| `user` | The human's message or question |
| `assistant` | A previous model response (for conversation context) |
| `system` | Instructions that shape the model's behavior |


```json
{
 "inputs": [
 {"role": "system", "content": "You are a concise technical writer."},
 {"role": "user", "content": "What is an API?"},
 {"role": "assistant", "content": "An API is an interface that lets software communicate."},
 {"role": "user", "content": "Give me an example."}
 ]
}
```

## Controlling Output Length

Use `max_output_synapses` to cap the response length:


```json
{
 "model": "gpt-4o",
 "max_output_synapses": 256,
 "inputs": [{"role": "user", "content": "Summarize the French Revolution."}]
}
```

See [Synapses and Neurons](/docs/documentation/core-concepts/synapses-and-neurons) for how output length maps to cost.

## Streaming

For real-time output as the model generates it, use the streaming endpoint:


```bash
curl https://api.aitronos.com/v1/model/response/stream \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "inputs": [{"role": "user", "content": "Write a haiku."}]
 }'
```

The response is a Server-Sent Events (SSE) stream. Each chunk contains a `delta` field with the incremental text. See [Streaming](/docs/documentation/running-methods/streaming-mode) for full details.

## Conversation History with Threads

To maintain conversation context across multiple requests, pass a `thread` ID:


```json
{
 "organization_id": "org_your_org_id",
 "model": "gpt-4o",
 "thread": "thrd_session_001",
 "inputs": [{"role": "user", "content": "What did I ask you before?"}]
}
```

If the thread doesn't exist, it's created automatically. See [Threads](/docs/documentation/core-concepts/threads) for full documentation.

## Assistants

For production deployments, configure behavior via an [Assistant](/docs/documentation/core-concepts/assistants) instead of passing a system prompt in every request:


```json
{
 "organization_id": "org_your_org_id",
 "assistant_id": "asst_abc123",
 "inputs": [{"role": "user", "content": "Hello!"}]
}
```

## Related Resources

- [Inputs and Outputs](/docs/documentation/core-concepts/inputs-and-outputs) — Full input format reference
- [Streaming](/docs/documentation/running-methods/streaming-mode) — Real-time output
- [Threads](/docs/documentation/core-concepts/threads) — Stateful conversations
- [Assistants](/docs/documentation/core-concepts/assistants) — Reusable model configurations
- [Synapses and Neurons](/docs/documentation/core-concepts/synapses-and-neurons) — Usage measurement
- [Create a Response](/docs/api-reference/responses/create) — Full API reference