# Streaming Mode

div
strong
🔨 In Development
 — This section is still being developed and may change.


Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses.

## What is Streaming?

Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:


```
Non-Streaming:
[Wait...] → "The capital of France is Paris."

Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."
```

## Enabling Streaming

Set `stream: true` in your request:


```bash
curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "stream": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a short story"}]
      }
    ]
  }'
```

## Response Format

Streaming responses use server-sent events (SSE):


```
event: response.created
data: {"id":"resp_abc123","status":"in_progress"}

event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"index":0,"delta":"Once"}

event: response.output_text.delta
data: {"index":0,"delta":" upon"}

event: response.output_text.delta
data: {"index":0,"delta":" a"}

event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}

event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}
```

## Implementation Examples

### Python


```python
import requests

response = requests.post(
    "https://api.freddy.aitronos.com/v1/model/response",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4.1",
        "stream": True,
        "inputs": [
            {"role": "user", "texts": [{"text": "Tell me a story"}]}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        # Parse SSE format
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                print(data['delta'], end='', flush=True)
```

### JavaScript


```javascript
const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4.1',
    stream: true,
    inputs: [
      { role: 'user', texts: [{ text: 'Tell me a story' }] }
    ]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.delta) {
        process.stdout.write(data.delta);
      }
    }
  }
}
```

### React Component


```jsx
import { useState } from 'react';

function StreamingChat() {
  const [output, setOutput] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const handleSubmit = async (message) => {
    setIsStreaming(true);
    setOutput('');

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'gpt-4.1',
        stream: true,
        inputs: [{ role: 'user', texts: [{ text: message }] }]
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));
          if (data.delta) {
            setOutput(prev => prev + data.delta);
          }
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div>
      <div className="output">{output}</div>
      {isStreaming && <div>Generating...</div>}
    </div>
  );
}
```

## Stream Obfuscation

Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:


```json
{
  "stream": true,
  "streamOptions": {
    "includeObfuscation": false  // Disable for bandwidth optimization
  }
}
```

**When to disable obfuscation:**

- Trusted network environment
- Bandwidth-constrained connections
- High-volume streaming applications


**Keep enabled (default) when:**

- Handling sensitive data
- Untrusted network paths
- Security is prioritized over bandwidth


## Event Types

### `response.created`

Response has been created and processing started.


```json
{"id":"resp_abc123","status":"in_progress"}
```

### `response.output_item.added`

New output item (message, tool call) has been added.


```json
{"index":0,"item":{"type":"message","role":"assistant"}}
```

### `response.output_text.delta`

Incremental text content generated.


```json
{"index":0,"delta":"Hello"}
```

### `response.output_text.done`

Text output for an item is complete.


```json
{"index":0,"text":"Hello, how can I help you?"}
```

### `response.tool_call.delta`

Tool call arguments are being generated.


```json
{"index":1,"delta":"{\"query\":\""}
```

### `response.completed`

Response generation has finished.


```json
{"status":"completed","usage":{"outputSynapses":245}}
```

### `response.failed`

Response generation encountered an error.


```json
{"status":"failed","error":{"code":"rate_limit_exceeded"}}
```

## Use Cases

### Interactive Chat

Show responses character-by-character for a natural conversation feel:


```python
def stream_chat(user_message):
    response = requests.post(
        api_url,
        json={"model": "gpt-4.1", "stream": True, "inputs": [...]},
        stream=True
    )
    
    for line in response.iter_lines():
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                yield data['delta']
```

### Content Generation

Display articles, stories, or documentation as they're written:


```javascript
async function generateBlogPost(topic) {
  const stream = await fetch('/api/generate', {
    method: 'POST',
    body: JSON.stringify({
      stream: true,
      inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
    })
  });
  
  // Update UI in real-time
  for await (const chunk of streamResponse(stream)) {
    document.getElementById('preview').textContent += chunk;
  }
}
```

### Code Generation

Show code being written line-by-line:


```python
def stream_code_generation(prompt):
    for chunk in stream_response(prompt):
        syntax_highlight_and_display(chunk)
        time.sleep(0.01)  # Smooth animation
```

## Best Practices

### ✅ DO

- **Buffer incomplete events** - SSE chunks may split across packets
- **Handle reconnection** - Implement retry logic for network issues
- **Parse incrementally** - Process deltas as they arrive
- **Show loading indicators** - Indicate streaming is in progress
- **Implement timeouts** - Don't wait indefinitely


### ❌ DON'T

- **Assume complete JSON** - Chunks may contain partial data
- **Block UI thread** - Process streams asynchronously
- **Ignore error events** - Handle `response.failed` appropriately
- **Forget to close streams** - Clean up connections when done


## Performance Considerations

**Latency:**

- **First token**: ~200-500ms
- **Subsequent tokens**: ~20-50ms each
- **Total time to first output**: Faster than non-streaming


**Bandwidth:**

- Streaming uses ~20% more bandwidth due to SSE overhead
- Disable `includeObfuscation` if bandwidth is critical


**User Experience:**

- Users perceive streaming as 50-70% faster
- Engagement increases with real-time feedback


## Troubleshooting

### Chunks Not Arriving

Check connection headers:


```bash
curl -N -H "Accept: text/event-stream" ...
```

### Incomplete JSON

Buffer until newlines:


```python
buffer = ""
for chunk in response.iter_content():
    buffer += chunk.decode()
    while '\n' in buffer:
        line, buffer = buffer.split('\n', 1)
        process_line(line)
```

### Connection Drops

Implement exponential backoff:


```javascript
async function streamWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await streamRequest(url);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(2 ** i * 1000);
    }
  }
}
```

**Related:**

- [Create Model Response](/docs/api-reference/responses/create) - API reference
- [Streaming Events Object](/docs/api-reference/objects/streaming-events) - Event schema
- [Best Practices](/docs/documentation/best-practices) - Optimization tips