Skip to content
Last updated

🔨 In Development — This section is still being developed and may change.
Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses.

What is Streaming?

Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:

Non-Streaming:
[Wait...] → "The capital of France is Paris."

Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."

Enabling Streaming

Set stream: true in your request:

curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "stream": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a short story"}]
      }
    ]
  }'

Response Format

Streaming responses use server-sent events (SSE):

event: response.created
data: {"id":"resp_abc123","status":"in_progress"}

event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"index":0,"delta":"Once"}

event: response.output_text.delta
data: {"index":0,"delta":" upon"}

event: response.output_text.delta
data: {"index":0,"delta":" a"}

event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}

event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}

Implementation Examples

Python

import requests

response = requests.post(
    "https://api.freddy.aitronos.com/v1/model/response",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4.1",
        "stream": True,
        "inputs": [
            {"role": "user", "texts": [{"text": "Tell me a story"}]}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        # Parse SSE format
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                print(data['delta'], end='', flush=True)

JavaScript

const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4.1',
    stream: true,
    inputs: [
      { role: 'user', texts: [{ text: 'Tell me a story' }] }
    ]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.delta) {
        process.stdout.write(data.delta);
      }
    }
  }
}

React Component

import { useState } from 'react';

function StreamingChat() {
  const [output, setOutput] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const handleSubmit = async (message) => {
    setIsStreaming(true);
    setOutput('');

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'gpt-4.1',
        stream: true,
        inputs: [{ role: 'user', texts: [{ text: message }] }]
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));
          if (data.delta) {
            setOutput(prev => prev + data.delta);
          }
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div>
      <div className="output">{output}</div>
      {isStreaming && <div>Generating...</div>}
    </div>
  );
}

Stream Obfuscation

Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:

{
  "stream": true,
  "streamOptions": {
    "includeObfuscation": false  // Disable for bandwidth optimization
  }
}

When to disable obfuscation:

  • Trusted network environment
  • Bandwidth-constrained connections
  • High-volume streaming applications

Keep enabled (default) when:

  • Handling sensitive data
  • Untrusted network paths
  • Security is prioritized over bandwidth

Event Types

response.created

Response has been created and processing started.

{"id":"resp_abc123","status":"in_progress"}

response.output_item.added

New output item (message, tool call) has been added.

{"index":0,"item":{"type":"message","role":"assistant"}}

response.output_text.delta

Incremental text content generated.

{"index":0,"delta":"Hello"}

response.output_text.done

Text output for an item is complete.

{"index":0,"text":"Hello, how can I help you?"}

response.tool_call.delta

Tool call arguments are being generated.

{"index":1,"delta":"{\"query\":\""}

response.completed

Response generation has finished.

{"status":"completed","usage":{"outputSynapses":245}}

response.failed

Response generation encountered an error.

{"status":"failed","error":{"code":"rate_limit_exceeded"}}

Use Cases

Interactive Chat

Show responses character-by-character for a natural conversation feel:

def stream_chat(user_message):
    response = requests.post(
        api_url,
        json={"model": "gpt-4.1", "stream": True, "inputs": [...]},
        stream=True
    )
    
    for line in response.iter_lines():
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                yield data['delta']

Content Generation

Display articles, stories, or documentation as they're written:

async function generateBlogPost(topic) {
  const stream = await fetch('/api/generate', {
    method: 'POST',
    body: JSON.stringify({
      stream: true,
      inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
    })
  });
  
  // Update UI in real-time
  for await (const chunk of streamResponse(stream)) {
    document.getElementById('preview').textContent += chunk;
  }
}

Code Generation

Show code being written line-by-line:

def stream_code_generation(prompt):
    for chunk in stream_response(prompt):
        syntax_highlight_and_display(chunk)
        time.sleep(0.01)  # Smooth animation

Best Practices

✅ DO

  • Buffer incomplete events - SSE chunks may split across packets
  • Handle reconnection - Implement retry logic for network issues
  • Parse incrementally - Process deltas as they arrive
  • Show loading indicators - Indicate streaming is in progress
  • Implement timeouts - Don't wait indefinitely

❌ DON'T

  • Assume complete JSON - Chunks may contain partial data
  • Block UI thread - Process streams asynchronously
  • Ignore error events - Handle response.failed appropriately
  • Forget to close streams - Clean up connections when done

Performance Considerations

Latency:

  • First token: ~200-500ms
  • Subsequent tokens: ~20-50ms each
  • Total time to first output: Faster than non-streaming

Bandwidth:

  • Streaming uses ~20% more bandwidth due to SSE overhead
  • Disable includeObfuscation if bandwidth is critical

User Experience:

  • Users perceive streaming as 50-70% faster
  • Engagement increases with real-time feedback

Troubleshooting

Chunks Not Arriving

Check connection headers:

curl -N -H "Accept: text/event-stream" ...

Incomplete JSON

Buffer until newlines:

buffer = ""
for chunk in response.iter_content():
    buffer += chunk.decode()
    while '\n' in buffer:
        line, buffer = buffer.split('\n', 1)
        process_line(line)

Connection Drops

Implement exponential backoff:

async function streamWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await streamRequest(url);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(2 ** i * 1000);
    }
  }
}

Related: