Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:
Non-Streaming:
[Wait...] → "The capital of France is Paris."
Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."Set stream: true in your request:
curl https://api.freddy.aitronos.com/v1/model/response \
-H "Authorization: Bearer $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"stream": true,
"inputs": [
{
"role": "user",
"texts": [{"text": "Write a short story"}]
}
]
}'Streaming responses use server-sent events (SSE):
event: response.created
data: {"id":"resp_abc123","status":"in_progress"}
event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}
event: response.output_text.delta
data: {"index":0,"delta":"Once"}
event: response.output_text.delta
data: {"index":0,"delta":" upon"}
event: response.output_text.delta
data: {"index":0,"delta":" a"}
event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}
event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}import requests
response = requests.post(
"https://api.freddy.aitronos.com/v1/model/response",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "gpt-4.1",
"stream": True,
"inputs": [
{"role": "user", "texts": [{"text": "Tell me a story"}]}
]
},
stream=True
)
for line in response.iter_lines():
if line:
# Parse SSE format
if line.startswith(b'data: '):
data = json.loads(line[6:])
if 'delta' in data:
print(data['delta'], end='', flush=True)const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4.1',
stream: true,
inputs: [
{ role: 'user', texts: [{ text: 'Tell me a story' }] }
]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.delta) {
process.stdout.write(data.delta);
}
}
}
}import { useState } from 'react';
function StreamingChat() {
const [output, setOutput] = useState('');
const [isStreaming, setIsStreaming] = useState(false);
const handleSubmit = async (message) => {
setIsStreaming(true);
setOutput('');
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'gpt-4.1',
stream: true,
inputs: [{ role: 'user', texts: [{ text: message }] }]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.delta) {
setOutput(prev => prev + data.delta);
}
}
}
}
setIsStreaming(false);
};
return (
<div>
<div className="output">{output}</div>
{isStreaming && <div>Generating...</div>}
</div>
);
}Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:
{
"stream": true,
"streamOptions": {
"includeObfuscation": false // Disable for bandwidth optimization
}
}When to disable obfuscation:
- Trusted network environment
- Bandwidth-constrained connections
- High-volume streaming applications
Keep enabled (default) when:
- Handling sensitive data
- Untrusted network paths
- Security is prioritized over bandwidth
Response has been created and processing started.
{"id":"resp_abc123","status":"in_progress"}New output item (message, tool call) has been added.
{"index":0,"item":{"type":"message","role":"assistant"}}Incremental text content generated.
{"index":0,"delta":"Hello"}Text output for an item is complete.
{"index":0,"text":"Hello, how can I help you?"}Tool call arguments are being generated.
{"index":1,"delta":"{\"query\":\""}Response generation has finished.
{"status":"completed","usage":{"outputSynapses":245}}Response generation encountered an error.
{"status":"failed","error":{"code":"rate_limit_exceeded"}}Show responses character-by-character for a natural conversation feel:
def stream_chat(user_message):
response = requests.post(
api_url,
json={"model": "gpt-4.1", "stream": True, "inputs": [...]},
stream=True
)
for line in response.iter_lines():
if line.startswith(b'data: '):
data = json.loads(line[6:])
if 'delta' in data:
yield data['delta']Display articles, stories, or documentation as they're written:
async function generateBlogPost(topic) {
const stream = await fetch('/api/generate', {
method: 'POST',
body: JSON.stringify({
stream: true,
inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
})
});
// Update UI in real-time
for await (const chunk of streamResponse(stream)) {
document.getElementById('preview').textContent += chunk;
}
}Show code being written line-by-line:
def stream_code_generation(prompt):
for chunk in stream_response(prompt):
syntax_highlight_and_display(chunk)
time.sleep(0.01) # Smooth animation- Buffer incomplete events - SSE chunks may split across packets
- Handle reconnection - Implement retry logic for network issues
- Parse incrementally - Process deltas as they arrive
- Show loading indicators - Indicate streaming is in progress
- Implement timeouts - Don't wait indefinitely
- Assume complete JSON - Chunks may contain partial data
- Block UI thread - Process streams asynchronously
- Ignore error events - Handle
response.failedappropriately - Forget to close streams - Clean up connections when done
Latency:
- First token: ~200-500ms
- Subsequent tokens: ~20-50ms each
- Total time to first output: Faster than non-streaming
Bandwidth:
- Streaming uses ~20% more bandwidth due to SSE overhead
- Disable
includeObfuscationif bandwidth is critical
User Experience:
- Users perceive streaming as 50-70% faster
- Engagement increases with real-time feedback
Check connection headers:
curl -N -H "Accept: text/event-stream" ...Buffer until newlines:
buffer = ""
for chunk in response.iter_content():
buffer += chunk.decode()
while '\n' in buffer:
line, buffer = buffer.split('\n', 1)
process_line(line)Implement exponential backoff:
async function streamWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await streamRequest(url);
} catch (error) {
if (i === maxRetries - 1) throw error;
await sleep(2 ** i * 1000);
}
}
}Related:
- Create Model Response - API reference
- Streaming Events Object - Event schema
- Best Practices - Optimization tips