Skip to content
Last updated

🔨 In Development — This section is still being developed and may change.
Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.

Overview

Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:

  • Long-running generations
  • Batch processing workflows
  • Non-blocking UI experiences
  • Queue-based architectures

How It Works

  1. Submit Request: Send a POST request with background: true
  2. Receive ID: Get an immediate response with a unique response ID
  3. Check Status: Poll the status endpoint to monitor progress
  4. Retrieve Result: Fetch the completed response when ready

Basic Example

# Submit background request
curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "background": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a detailed article about AI"}]
      }
    ]
  }'

Response:

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "queued",
  "created_at": 1741476542
}

Checking Status

Poll the status endpoint to monitor progress:

curl https://api.freddy.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
  -H "Authorization: Bearer $FREDDY_API_KEY"

Response (In Progress):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "in_progress",
  "created_at": 1741476542
}

Response (Completed):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "completed",
  "created_at": 1741476542,
  "completed_at": 1741476550,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Here is a detailed article about AI..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 450,
    "total_tokens": 462
  }
}

Status Values

StatusDescription
queuedRequest received and queued for processing
in_progressCurrently being processed
completedSuccessfully completed
failedProcessing failed (see error details)
cancelledRequest was cancelled

Best Practices

Polling Interval

Don't poll too frequently. Recommended intervals:

  • Short tasks (<30s expected): Poll every 2-3 seconds
  • Medium tasks (30s-5min expected): Poll every 5-10 seconds
  • Long tasks (>5min expected): Poll every 30-60 seconds

Timeout Handling

Set reasonable timeouts based on your use case:

async function waitForCompletion(responseId, maxWaitTime = 300000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < maxWaitTime) {
    const status = await checkStatus(responseId);
    
    if (status.status === 'completed') {
      return status;
    }
    
    if (status.status === 'failed') {
      throw new Error('Response failed');
    }
    
    await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
  }
  
  throw new Error('Timeout waiting for response');
}

Webhooks (Coming Soon)

Instead of polling, you can register a webhook URL to receive notifications when processing completes.

Use Cases

Batch Processing

Process multiple requests in parallel:

const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];

// Submit all requests
for (const prompt of requests) {
  const response = await fetch('/v1/model/response', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${apiKey}` },
    body: JSON.stringify({
      model: 'gpt-4.1',
      background: true,
      inputs: [{ role: 'user', texts: [{ text: prompt }] }]
    })
  });
  const data = await response.json();
  responseIds.push(data.id);
}

// Wait for all to complete
const results = await Promise.all(
  responseIds.map(id => waitForCompletion(id))
);

Non-Blocking UI

Keep your UI responsive while processing:

async function generateResponse(prompt) {
  // Submit background request
  const response = await submitBackgroundRequest(prompt);
  
  // Show loading state with ID
  showLoadingMessage(`Processing... (ID: ${response.id})`);
  
  // Poll for completion
  const result = await waitForCompletion(response.id);
  
  // Update UI with result
  displayResult(result);
}

Queue-Based Architecture

Integrate with your job queue:

from celery import Celery
import requests

app = Celery('tasks')

@app.task
def process_ai_response(prompt):
    # Submit background request
    response = requests.post(
        'https://api.freddy.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'background': True,
            'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
        }
    )
    response_id = response.json()['id']
    
    # Store ID for status checking
    return response_id

Limitations

  • Maximum processing time: 30 minutes
  • Result retention: Results stored for 24 hours
  • Rate limits: Same as synchronous requests