# Streaming Mode
div
strong
🔨 In Development
— This section is still being developed and may change.
Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses.
## What is Streaming?
Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:
```
Non-Streaming:
[Wait...] → "The capital of France is Paris."
Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."
```
## Enabling Streaming
Set `stream: true` in your request:
```bash
curl https://api.freddy.aitronos.com/v1/model/response \
-H "Authorization: Bearer $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"stream": true,
"inputs": [
{
"role": "user",
"texts": [{"text": "Write a short story"}]
}
]
}'
```
## Response Format
Streaming responses use server-sent events (SSE):
```
event: response.created
data: {"id":"resp_abc123","status":"in_progress"}
event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}
event: response.output_text.delta
data: {"index":0,"delta":"Once"}
event: response.output_text.delta
data: {"index":0,"delta":" upon"}
event: response.output_text.delta
data: {"index":0,"delta":" a"}
event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}
event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}
```
## Implementation Examples
### Python
```python
import requests
response = requests.post(
"https://api.freddy.aitronos.com/v1/model/response",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "gpt-4.1",
"stream": True,
"inputs": [
{"role": "user", "texts": [{"text": "Tell me a story"}]}
]
},
stream=True
)
for line in response.iter_lines():
if line:
# Parse SSE format
if line.startswith(b'data: '):
data = json.loads(line[6:])
if 'delta' in data:
print(data['delta'], end='', flush=True)
```
### JavaScript
```javascript
const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4.1',
stream: true,
inputs: [
{ role: 'user', texts: [{ text: 'Tell me a story' }] }
]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.delta) {
process.stdout.write(data.delta);
}
}
}
}
```
### React Component
```jsx
import { useState } from 'react';
function StreamingChat() {
const [output, setOutput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const handleSubmit = async (message) => {
setIsStreaming(true);
setOutput('');
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'gpt-4.1',
stream: true,
inputs: [{ role: 'user', texts: [{ text: message }] }]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.delta) {
setOutput(prev => prev + data.delta);
}
}
}
}
setIsStreaming(false);
};
return (
{output}
{isStreaming &&
Generating...
}
);
}
```
## Stream Obfuscation
Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:
```json
{
"stream": true,
"streamOptions": {
"includeObfuscation": false // Disable for bandwidth optimization
}
}
```
**When to disable obfuscation:**
- Trusted network environment
- Bandwidth-constrained connections
- High-volume streaming applications
**Keep enabled (default) when:**
- Handling sensitive data
- Untrusted network paths
- Security is prioritized over bandwidth
## Event Types
### `response.created`
Response has been created and processing started.
```json
{"id":"resp_abc123","status":"in_progress"}
```
### `response.output_item.added`
New output item (message, tool call) has been added.
```json
{"index":0,"item":{"type":"message","role":"assistant"}}
```
### `response.output_text.delta`
Incremental text content generated.
```json
{"index":0,"delta":"Hello"}
```
### `response.output_text.done`
Text output for an item is complete.
```json
{"index":0,"text":"Hello, how can I help you?"}
```
### `response.tool_call.delta`
Tool call arguments are being generated.
```json
{"index":1,"delta":"{\"query\":\""}
```
### `response.completed`
Response generation has finished.
```json
{"status":"completed","usage":{"outputSynapses":245}}
```
### `response.failed`
Response generation encountered an error.
```json
{"status":"failed","error":{"code":"rate_limit_exceeded"}}
```
## Use Cases
### Interactive Chat
Show responses character-by-character for a natural conversation feel:
```python
def stream_chat(user_message):
response = requests.post(
api_url,
json={"model": "gpt-4.1", "stream": True, "inputs": [...]},
stream=True
)
for line in response.iter_lines():
if line.startswith(b'data: '):
data = json.loads(line[6:])
if 'delta' in data:
yield data['delta']
```
### Content Generation
Display articles, stories, or documentation as they're written:
```javascript
async function generateBlogPost(topic) {
const stream = await fetch('/api/generate', {
method: 'POST',
body: JSON.stringify({
stream: true,
inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
})
});
// Update UI in real-time
for await (const chunk of streamResponse(stream)) {
document.getElementById('preview').textContent += chunk;
}
}
```
### Code Generation
Show code being written line-by-line:
```python
def stream_code_generation(prompt):
for chunk in stream_response(prompt):
syntax_highlight_and_display(chunk)
time.sleep(0.01) # Smooth animation
```
## Best Practices
### ✅ DO
- **Buffer incomplete events** - SSE chunks may split across packets
- **Handle reconnection** - Implement retry logic for network issues
- **Parse incrementally** - Process deltas as they arrive
- **Show loading indicators** - Indicate streaming is in progress
- **Implement timeouts** - Don't wait indefinitely
### ❌ DON'T
- **Assume complete JSON** - Chunks may contain partial data
- **Block UI thread** - Process streams asynchronously
- **Ignore error events** - Handle `response.failed` appropriately
- **Forget to close streams** - Clean up connections when done
## Performance Considerations
**Latency:**
- **First token**: ~200-500ms
- **Subsequent tokens**: ~20-50ms each
- **Total time to first output**: Faster than non-streaming
**Bandwidth:**
- Streaming uses ~20% more bandwidth due to SSE overhead
- Disable `includeObfuscation` if bandwidth is critical
**User Experience:**
- Users perceive streaming as 50-70% faster
- Engagement increases with real-time feedback
## Troubleshooting
### Chunks Not Arriving
Check connection headers:
```bash
curl -N -H "Accept: text/event-stream" ...
```
### Incomplete JSON
Buffer until newlines:
```python
buffer = ""
for chunk in response.iter_content():
buffer += chunk.decode()
while '\n' in buffer:
line, buffer = buffer.split('\n', 1)
process_line(line)
```
### Connection Drops
Implement exponential backoff:
```javascript
async function streamWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await streamRequest(url);
} catch (error) {
if (i === maxRetries - 1) throw error;
await sleep(2 ** i * 1000);
}
}
}
```
**Related:**
- [Create Model Response](/docs/api-reference/responses/create) - API reference
- [Streaming Events Object](/docs/api-reference/objects/streaming-events) - Event schema
- [Best Practices](/docs/documentation/best-practices) - Optimization tips