# Streaming Mode div strong 🔨 In Development — This section is still being developed and may change. Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses. ## What is Streaming? Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them: ``` Non-Streaming: [Wait...] → "The capital of France is Paris." Streaming: "The" → " capital" → " of" → " France" → " is" → " Paris" → "." ``` ## Enabling Streaming Set `stream: true` in your request: ```bash curl https://api.freddy.aitronos.com/v1/model/response \ -H "Authorization: Bearer $FREDDY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4.1", "stream": true, "inputs": [ { "role": "user", "texts": [{"text": "Write a short story"}] } ] }' ``` ## Response Format Streaming responses use server-sent events (SSE): ``` event: response.created data: {"id":"resp_abc123","status":"in_progress"} event: response.output_item.added data: {"index":0,"item":{"type":"message","role":"assistant"}} event: response.output_text.delta data: {"index":0,"delta":"Once"} event: response.output_text.delta data: {"index":0,"delta":" upon"} event: response.output_text.delta data: {"index":0,"delta":" a"} event: response.output_text.done data: {"index":0,"text":"Once upon a time..."} event: response.completed data: {"status":"completed","usage":{"outputSynapses":156}} ``` ## Implementation Examples ### Python ```python import requests response = requests.post( "https://api.freddy.aitronos.com/v1/model/response", headers={ "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" }, json={ "model": "gpt-4.1", "stream": True, "inputs": [ {"role": "user", "texts": [{"text": "Tell me a story"}]} ] }, stream=True ) for line in response.iter_lines(): if line: # Parse SSE format if line.startswith(b'data: '): data = json.loads(line[6:]) if 'delta' in data: print(data['delta'], end='', flush=True) ``` ### JavaScript ```javascript const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gpt-4.1', stream: true, inputs: [ { role: 'user', texts: [{ text: 'Tell me a story' }] } ] }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.delta) { process.stdout.write(data.delta); } } } } ``` ### React Component ```jsx import { useState } from 'react'; function StreamingChat() { const [output, setOutput] = useState(''); const [isStreaming, setIsStreaming] = useState(false); const handleSubmit = async (message) => { setIsStreaming(true); setOutput(''); const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gpt-4.1', stream: true, inputs: [{ role: 'user', texts: [{ text: message }] }] }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); if (data.delta) { setOutput(prev => prev + data.delta); } } } } setIsStreaming(false); }; return (
{output}
{isStreaming &&
Generating...
}
); } ``` ## Stream Obfuscation Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks: ```json { "stream": true, "streamOptions": { "includeObfuscation": false // Disable for bandwidth optimization } } ``` **When to disable obfuscation:** - Trusted network environment - Bandwidth-constrained connections - High-volume streaming applications **Keep enabled (default) when:** - Handling sensitive data - Untrusted network paths - Security is prioritized over bandwidth ## Event Types ### `response.created` Response has been created and processing started. ```json {"id":"resp_abc123","status":"in_progress"} ``` ### `response.output_item.added` New output item (message, tool call) has been added. ```json {"index":0,"item":{"type":"message","role":"assistant"}} ``` ### `response.output_text.delta` Incremental text content generated. ```json {"index":0,"delta":"Hello"} ``` ### `response.output_text.done` Text output for an item is complete. ```json {"index":0,"text":"Hello, how can I help you?"} ``` ### `response.tool_call.delta` Tool call arguments are being generated. ```json {"index":1,"delta":"{\"query\":\""} ``` ### `response.completed` Response generation has finished. ```json {"status":"completed","usage":{"outputSynapses":245}} ``` ### `response.failed` Response generation encountered an error. ```json {"status":"failed","error":{"code":"rate_limit_exceeded"}} ``` ## Use Cases ### Interactive Chat Show responses character-by-character for a natural conversation feel: ```python def stream_chat(user_message): response = requests.post( api_url, json={"model": "gpt-4.1", "stream": True, "inputs": [...]}, stream=True ) for line in response.iter_lines(): if line.startswith(b'data: '): data = json.loads(line[6:]) if 'delta' in data: yield data['delta'] ``` ### Content Generation Display articles, stories, or documentation as they're written: ```javascript async function generateBlogPost(topic) { const stream = await fetch('/api/generate', { method: 'POST', body: JSON.stringify({ stream: true, inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }] }) }); // Update UI in real-time for await (const chunk of streamResponse(stream)) { document.getElementById('preview').textContent += chunk; } } ``` ### Code Generation Show code being written line-by-line: ```python def stream_code_generation(prompt): for chunk in stream_response(prompt): syntax_highlight_and_display(chunk) time.sleep(0.01) # Smooth animation ``` ## Best Practices ### ✅ DO - **Buffer incomplete events** - SSE chunks may split across packets - **Handle reconnection** - Implement retry logic for network issues - **Parse incrementally** - Process deltas as they arrive - **Show loading indicators** - Indicate streaming is in progress - **Implement timeouts** - Don't wait indefinitely ### ❌ DON'T - **Assume complete JSON** - Chunks may contain partial data - **Block UI thread** - Process streams asynchronously - **Ignore error events** - Handle `response.failed` appropriately - **Forget to close streams** - Clean up connections when done ## Performance Considerations **Latency:** - **First token**: ~200-500ms - **Subsequent tokens**: ~20-50ms each - **Total time to first output**: Faster than non-streaming **Bandwidth:** - Streaming uses ~20% more bandwidth due to SSE overhead - Disable `includeObfuscation` if bandwidth is critical **User Experience:** - Users perceive streaming as 50-70% faster - Engagement increases with real-time feedback ## Troubleshooting ### Chunks Not Arriving Check connection headers: ```bash curl -N -H "Accept: text/event-stream" ... ``` ### Incomplete JSON Buffer until newlines: ```python buffer = "" for chunk in response.iter_content(): buffer += chunk.decode() while '\n' in buffer: line, buffer = buffer.split('\n', 1) process_line(line) ``` ### Connection Drops Implement exponential backoff: ```javascript async function streamWithRetry(url, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { return await streamRequest(url); } catch (error) { if (i === maxRetries - 1) throw error; await sleep(2 ** i * 1000); } } } ``` **Related:** - [Create Model Response](/docs/api-reference/responses/create) - API reference - [Streaming Events Object](/docs/api-reference/objects/streaming-events) - Event schema - [Best Practices](/docs/documentation/best-practices) - Optimization tips