Background Mode
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

🔨 In Development — This section is still being developed and may change.

Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.

Overview

Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:

Long-running generations
Batch processing workflows
Non-blocking UI experiences
Queue-based architectures

How It Works

Submit Request: Send a POST request with background: true
Receive ID: Get an immediate response with a unique response ID
Check Status: Poll the status endpoint to monitor progress
Retrieve Result: Fetch the completed response when ready

Basic Example

# Submit background request
curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "background": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a detailed article about AI"}]
      }
    ]
  }'

Response:

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "queued",
  "created_at": 1741476542
}

Checking Status

Poll the status endpoint to monitor progress:

curl https://api.freddy.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
  -H "Authorization: Bearer $FREDDY_API_KEY"

Response (In Progress):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "in_progress",
  "created_at": 1741476542
}

Response (Completed):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "completed",
  "created_at": 1741476542,
  "completed_at": 1741476550,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Here is a detailed article about AI..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 450,
    "total_tokens": 462
  }
}

Status Values

Status	Description
`queued`	Request received and queued for processing
`in_progress`	Currently being processed
`completed`	Successfully completed
`failed`	Processing failed (see error details)
`cancelled`	Request was cancelled

Best Practices

Polling Interval

Don't poll too frequently. Recommended intervals:

Short tasks (<30s expected): Poll every 2-3 seconds
Medium tasks (30s-5min expected): Poll every 5-10 seconds
Long tasks (>5min expected): Poll every 30-60 seconds

Timeout Handling

Set reasonable timeouts based on your use case:

async function waitForCompletion(responseId, maxWaitTime = 300000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < maxWaitTime) {
    const status = await checkStatus(responseId);
    
    if (status.status === 'completed') {
      return status;
    }
    
    if (status.status === 'failed') {
      throw new Error('Response failed');
    }
    
    await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
  }
  
  throw new Error('Timeout waiting for response');
}

Webhooks (Coming Soon)

Instead of polling, you can register a webhook URL to receive notifications when processing completes.

Use Cases

Batch Processing

Process multiple requests in parallel:

const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];

// Submit all requests
for (const prompt of requests) {
  const response = await fetch('/v1/model/response', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${apiKey}` },
    body: JSON.stringify({
      model: 'gpt-4.1',
      background: true,
      inputs: [{ role: 'user', texts: [{ text: prompt }] }]
    })
  });
  const data = await response.json();
  responseIds.push(data.id);
}

// Wait for all to complete
const results = await Promise.all(
  responseIds.map(id => waitForCompletion(id))
);

Non-Blocking UI

Keep your UI responsive while processing:

async function generateResponse(prompt) {
  // Submit background request
  const response = await submitBackgroundRequest(prompt);
  
  // Show loading state with ID
  showLoadingMessage(`Processing... (ID: ${response.id})`);
  
  // Poll for completion
  const result = await waitForCompletion(response.id);
  
  // Update UI with result
  displayResult(result);
}

Queue-Based Architecture

Integrate with your job queue:

from celery import Celery
import requests

app = Celery('tasks')

@app.task
def process_ai_response(prompt):
    # Submit background request
    response = requests.post(
        'https://api.freddy.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'background': True,
            'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
        }
    )
    response_id = response.json()['id']
    
    # Store ID for status checking
    return response_id

Limitations

Maximum processing time: 30 minutes
Result retention: Results stored for 24 hours
Rate limits: Same as synchronous requests

Background ModeCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude