# Streaming Mode

Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses.

## What is Streaming?

Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:


```
Non-Streaming:
[Wait...] → "The capital of France is Paris."

Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."
```

## Enabling Streaming

Set `stream: true` in your request:


```bash
curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-4o",
 "stream": true,
 "inputs": [
 {
 "role": "user",
 "content": "Write a short story"
 }
 ]
 }'
```

## Response Format

Streaming responses use server-sent events (SSE):


```
event: response.created
data: {"id":"resp_abc123","status":"in_progress"}

event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"index":0,"delta":"Once"}

event: response.output_text.delta
data: {"index":0,"delta":" upon"}

event: response.output_text.delta
data: {"index":0,"delta":" a"}

event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}

event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}
```

## Implementation Examples

### Python


```python
import requests

response = requests.post(
 "https://api.aitronos.com/v1/model/response",
 headers={
 "X-API-Key": api_key,
 "Content-Type": "application/json"
 },
 json={
 "model": "gpt-4o",
 "stream": True,
 "inputs": [
 {"role": "user", "content": "Tell me a story"}
 ]
 },
 stream=True
)

for line in response.iter_lines():
 if line:
 # Parse SSE format
 if line.startswith(b'data: '):
 data = json.loads(line[6:])
 if 'delta' in data:
 print(data['delta'], end='', flush=True)
```

### JavaScript


```javascript
const response = await fetch('https://api.aitronos.com/v1/model/response', {
 method: 'POST',
 headers: {
 'X-API-Key': apiKey,
 'Content-Type': 'application/json'
 },
 body: JSON.stringify({
 model: 'gpt-4o',
 stream: true,
 inputs: [
 { role: 'user', texts: [{ text: 'Tell me a story' }] }
 ]
 })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
 const { done, value } = await reader.read();
 if (done) break;

 const chunk = decoder.decode(value);
 const lines = chunk.split('\n');

 for (const line of lines) {
 if (line.startsWith('data: ')) {
 const data = JSON.parse(line.slice(6));
 if (data.delta) {
 process.stdout.write(data.delta);
 }
 }
 }
}
```

### React Component


```jsx
import { useState } from 'react';

function StreamingChat() {
 const [output, setOutput] = useState('');
 const [isStreaming, setIsStreaming] = useState(false);

 const handleSubmit = async (message) => {
 setIsStreaming(true);
 setOutput('');

 const response = await fetch('/api/chat', {
 method: 'POST',
 headers: { 'Content-Type': 'application/json' },
 body: JSON.stringify({
 model: 'gpt-4o',
 stream: true,
 inputs: [{ role: 'user', texts: [{ text: message }] }]
 })
 });

 const reader = response.body.getReader();
 const decoder = new TextDecoder();

 while (true) {
 const { done, value } = await reader.read();
 if (done) break;

 const chunk = decoder.decode(value);
 const lines = chunk.split('\n');

 for (const line of lines) {
 if (line.startsWith('data: ')) {
 const data = JSON.parse(line.slice(6));
 if (data.delta) {
 setOutput(prev => prev + data.delta);
 }
 }
 }
 }

 setIsStreaming(false);
 };

 return (
 <div>
 <div className="output">{output}</div>
 {isStreaming && <div>Generating...</div>}
 </div>
 );
}
```

## Stream Obfuscation

Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:


```json
{
 "stream": true,
 "streamOptions": {
 "includeObfuscation": false // Disable for bandwidth optimization
 }
}
```

**When to disable obfuscation:**

- Trusted network environment
- Bandwidth-constrained connections
- High-volume streaming applications


**Keep enabled (default) when:**

- Handling sensitive data
- Untrusted network paths
- Security is prioritized over bandwidth


## Event Types

### `response.created`

Response has been created and processing started.


```json
{"id":"resp_abc123","status":"in_progress"}
```

### `response.output_item.added`

New output item (message, tool call) has been added.


```json
{"index":0,"item":{"type":"message","role":"assistant"}}
```

### `response.output_text.delta`

Incremental text content generated.


```json
{"index":0,"delta":"Hello"}
```

### `response.output_text.done`

Text output for an item is complete.


```json
{"index":0,"text":"Hello, how can I help you?"}
```

### `response.tool_call.delta`

Tool call arguments are being generated.


```json
{"index":1,"delta":"{\"query\":\""}
```

### `response.completed`

Response generation has finished.


```json
{"status":"completed","usage":{"outputSynapses":245}}
```

### `response.failed`

Response generation encountered an error.


```json
{"status":"failed","error":{"code":"rate_limit_exceeded"}}
```

## Use Cases

### Interactive Chat

Show responses character-by-character for a natural conversation feel:


```python
def stream_chat(user_message):
 response = requests.post(
 api_url,
 json={"model": "gpt-4o", "stream": True, "inputs": [...]},
 stream=True
 )

 for line in response.iter_lines():
 if line.startswith(b'data: '):
 data = json.loads(line[6:])
 if 'delta' in data:
 yield data['delta']
```

### Content Generation

Display articles, stories, or documentation as they're written:


```javascript
async function generateBlogPost(topic) {
 const stream = await fetch('/api/generate', {
 method: 'POST',
 body: JSON.stringify({
 stream: true,
 inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
 })
 });

 // Update UI in real-time
 for await (const chunk of streamResponse(stream)) {
 document.getElementById('preview').textContent += chunk;
 }
}
```

### Code Generation

Show code being written line-by-line:


```python
def stream_code_generation(prompt):
 for chunk in stream_response(prompt):
 syntax_highlight_and_display(chunk)
 time.sleep(0.01) # Smooth animation
```

## Best Practices

### DO

- **Buffer incomplete events** - SSE chunks may split across packets
- **Handle reconnection** - Implement retry logic for network issues
- **Parse incrementally** - Process deltas as they arrive
- **Show loading indicators** - Indicate streaming is in progress
- **Implement timeouts** - Don't wait indefinitely


### DON'T

- **Assume complete JSON** - Chunks may contain partial data
- **Block UI thread** - Process streams asynchronously
- **Ignore error events** - Handle `response.failed` appropriately
- **Forget to close streams** - Clean up connections when done


## Performance Considerations

**Latency:**

- **First token**: ~200-500ms
- **Subsequent tokens**: ~20-50ms each
- **Total time to first output**: Faster than non-streaming


**Bandwidth:**

- Streaming uses ~20% more bandwidth due to SSE overhead
- Disable `includeObfuscation` if bandwidth is critical


**User Experience:**

- Users perceive streaming as 50-70% faster
- Engagement increases with real-time feedback


## Troubleshooting

### Chunks Not Arriving

Check connection headers:


```bash
curl -N -H "Accept: text/event-stream" ...
```

### Incomplete JSON

Buffer until newlines:


```python
buffer = ""
for chunk in response.iter_content():
 buffer += chunk.decode()
 while '\n' in buffer:
 line, buffer = buffer.split('\n', 1)
 process_line(line)
```

### Connection Drops

Implement exponential backoff:


```javascript
async function streamWithRetry(url, maxRetries = 3) {
 for (let i = 0; i < maxRetries; i++) {
 try {
 return await streamRequest(url);
 } catch (error) {
 if (i === maxRetries - 1) throw error;
 await sleep(2 ** i * 1000);
 }
 }
}
```

**Related:**

- [Create Model Response](/docs/api-reference/responses/create) - API reference
- [Streaming Events Object](/assets/streaming-events.82eb81b073d37490f8e78912b580b9a4dc70b6d1e87478a9df24a71c5b057264.a2623633.md) - Event schema
- [Best Practices](/docs/documentation/best-practices) - Optimization tips