Streaming Mode
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

🔨 In Development — This section is still being developed and may change.

Streaming mode delivers model responses incrementally as they're generated, enabling real-time user experiences without waiting for complete responses.

What is Streaming?

Instead of waiting for the entire response to complete, streaming sends partial outputs as server-sent events (SSE) as the model generates them:

Non-Streaming:
[Wait...] → "The capital of France is Paris."

Streaming:
"The" → " capital" → " of" → " France" → " is" → " Paris" → "."

Enabling Streaming

Set stream: true in your request:

curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "stream": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a short story"}]
      }
    ]
  }'

Response Format

Streaming responses use server-sent events (SSE):

event: response.created
data: {"id":"resp_abc123","status":"in_progress"}

event: response.output_item.added
data: {"index":0,"item":{"type":"message","role":"assistant"}}

event: response.output_text.delta
data: {"index":0,"delta":"Once"}

event: response.output_text.delta
data: {"index":0,"delta":" upon"}

event: response.output_text.delta
data: {"index":0,"delta":" a"}

event: response.output_text.done
data: {"index":0,"text":"Once upon a time..."}

event: response.completed
data: {"status":"completed","usage":{"outputSynapses":156}}

Implementation Examples

Python

import requests

response = requests.post(
    "https://api.freddy.aitronos.com/v1/model/response",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4.1",
        "stream": True,
        "inputs": [
            {"role": "user", "texts": [{"text": "Tell me a story"}]}
        ]
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        # Parse SSE format
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                print(data['delta'], end='', flush=True)

JavaScript

const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4.1',
    stream: true,
    inputs: [
      { role: 'user', texts: [{ text: 'Tell me a story' }] }
    ]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');
  
  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.delta) {
        process.stdout.write(data.delta);
      }
    }
  }
}

React Component

import { useState } from 'react';

function StreamingChat() {
  const [output, setOutput] = useState('');
  const [isStreaming, setIsStreaming] = useState(false);

  const handleSubmit = async (message) => {
    setIsStreaming(true);
    setOutput('');

    const response = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'gpt-4.1',
        stream: true,
        inputs: [{ role: 'user', texts: [{ text: message }] }]
      })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));
          if (data.delta) {
            setOutput(prev => prev + data.delta);
          }
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div>
      <div className="output">{output}</div>
      {isStreaming && <div>Generating...</div>}
    </div>
  );
}

Stream Obfuscation

Stream obfuscation adds random padding to normalize payload sizes, mitigating timing-based side-channel attacks:

{
  "stream": true,
  "streamOptions": {
    "includeObfuscation": false  // Disable for bandwidth optimization
  }
}

When to disable obfuscation:

Trusted network environment
Bandwidth-constrained connections
High-volume streaming applications

Keep enabled (default) when:

Handling sensitive data
Untrusted network paths
Security is prioritized over bandwidth

Event Types

`response.created`

Response has been created and processing started.

{"id":"resp_abc123","status":"in_progress"}

`response.output_item.added`

New output item (message, tool call) has been added.

{"index":0,"item":{"type":"message","role":"assistant"}}

`response.output_text.delta`

Incremental text content generated.

{"index":0,"delta":"Hello"}

`response.output_text.done`

Text output for an item is complete.

{"index":0,"text":"Hello, how can I help you?"}

`response.tool_call.delta`

Tool call arguments are being generated.

{"index":1,"delta":"{\"query\":\""}

`response.completed`

Response generation has finished.

{"status":"completed","usage":{"outputSynapses":245}}

`response.failed`

Response generation encountered an error.

{"status":"failed","error":{"code":"rate_limit_exceeded"}}

Use Cases

Interactive Chat

Show responses character-by-character for a natural conversation feel:

def stream_chat(user_message):
    response = requests.post(
        api_url,
        json={"model": "gpt-4.1", "stream": True, "inputs": [...]},
        stream=True
    )
    
    for line in response.iter_lines():
        if line.startswith(b'data: '):
            data = json.loads(line[6:])
            if 'delta' in data:
                yield data['delta']

Content Generation

Display articles, stories, or documentation as they're written:

async function generateBlogPost(topic) {
  const stream = await fetch('/api/generate', {
    method: 'POST',
    body: JSON.stringify({
      stream: true,
      inputs: [{ role: 'user', texts: [{ text: `Write about ${topic}` }] }]
    })
  });
  
  // Update UI in real-time
  for await (const chunk of streamResponse(stream)) {
    document.getElementById('preview').textContent += chunk;
  }
}

Code Generation

Show code being written line-by-line:

def stream_code_generation(prompt):
    for chunk in stream_response(prompt):
        syntax_highlight_and_display(chunk)
        time.sleep(0.01)  # Smooth animation

Best Practices

✅ DO

Buffer incomplete events - SSE chunks may split across packets
Handle reconnection - Implement retry logic for network issues
Parse incrementally - Process deltas as they arrive
Show loading indicators - Indicate streaming is in progress
Implement timeouts - Don't wait indefinitely

❌ DON'T

Assume complete JSON - Chunks may contain partial data
Block UI thread - Process streams asynchronously
Ignore error events - Handle response.failed appropriately
Forget to close streams - Clean up connections when done

Performance Considerations

Latency:

First token: ~200-500ms
Subsequent tokens: ~20-50ms each
Total time to first output: Faster than non-streaming

Bandwidth:

Streaming uses ~20% more bandwidth due to SSE overhead
Disable includeObfuscation if bandwidth is critical

User Experience:

Users perceive streaming as 50-70% faster
Engagement increases with real-time feedback

Troubleshooting

Chunks Not Arriving

Check connection headers:

curl -N -H "Accept: text/event-stream" ...

Incomplete JSON

Buffer until newlines:

buffer = ""
for chunk in response.iter_content():
    buffer += chunk.decode()
    while '\n' in buffer:
        line, buffer = buffer.split('\n', 1)
        process_line(line)

Connection Drops

Implement exponential backoff:

async function streamWithRetry(url, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await streamRequest(url);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(2 ** i * 1000);
    }
  }
}

Related:

Create Model Response - API reference
Streaming Events Object - Event schema
Best Practices - Optimization tips

Streaming ModeCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude