Skip to content
Last updated

Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.

Overview

Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:

  • Long-running generations
  • Batch processing workflows
  • Non-blocking UI experiences
  • Queue-based architectures

How It Works

  1. Submit Request: Send a POST request with background: true
  2. Receive ID: Get an immediate response with a unique response ID
  3. Check Status: Poll the status endpoint to monitor progress
  4. Retrieve Result: Fetch the completed response when ready

Basic Example

# Submit background request
curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "background": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Write a detailed article about AI"}]
      }
    ]
  }'

Response:

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "queued",
  "created_at": 1741476542
}

Checking Status

Poll the status endpoint to monitor progress:

curl https://api.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
  -H "X-API-Key: $FREDDY_API_KEY"

Response (In Progress):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "in_progress",
  "created_at": 1741476542
}

Response (Completed):

{
  "id": "resp_67ccd2bed1ec8190",
  "status": "completed",
  "created_at": 1741476542,
  "completed_at": 1741476550,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "Here is a detailed article about AI..."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 450,
    "total_tokens": 462
  }
}

Status Values

StatusDescription
queuedRequest received and queued for processing
in_progressCurrently being processed
completedSuccessfully completed
failedProcessing failed (see error details)
cancelledRequest was cancelled

Best Practices

Polling Interval

Don't poll too frequently. Recommended intervals:

  • Short tasks (<30s expected): Poll every 2-3 seconds
  • Medium tasks (30s-5min expected): Poll every 5-10 seconds
  • Long tasks (>5min expected): Poll every 30-60 seconds

Timeout Handling

Set reasonable timeouts based on your use case:

async function waitForCompletion(responseId, maxWaitTime = 300000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < maxWaitTime) {
    const status = await checkStatus(responseId);
    
    if (status.status === 'completed') {
      return status;
    }
    
    if (status.status === 'failed') {
      throw new Error('Response failed');
    }
    
    await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
  }
  
  throw new Error('Timeout waiting for response');
}

Webhooks (Coming Soon)

Instead of polling, you can register a webhook URL to receive notifications when processing completes.

Use Cases

Batch Processing

Process multiple requests in parallel:

const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];

// Submit all requests
for (const prompt of requests) {
  const response = await fetch('/v1/model/response', {
    method: 'POST',
    headers: { 'X-API-Key': apiKey },
    body: JSON.stringify({
      model: 'gpt-4.1',
      background: true,
      inputs: [{ role: 'user', texts: [{ text: prompt }] }]
    })
  });
  const data = await response.json();
  responseIds.push(data.id);
}

// Wait for all to complete
const results = await Promise.all(
  responseIds.map(id => waitForCompletion(id))
);

Non-Blocking UI

Keep your UI responsive while processing:

async function generateResponse(prompt) {
  // Submit background request
  const response = await submitBackgroundRequest(prompt);
  
  // Show loading state with ID
  showLoadingMessage(`Processing... (ID: ${response.id})`);
  
  // Poll for completion
  const result = await waitForCompletion(response.id);
  
  // Update UI with result
  displayResult(result);
}

Queue-Based Architecture

Integrate with your job queue:

from celery import Celery
import requests

app = Celery('tasks')

@app.task
def process_ai_response(prompt):
    # Submit background request
    response = requests.post(
        'https://api.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'background': True,
            'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
        }
    )
    response_id = response.json()['id']
    
    # Store ID for status checking
    return response_id

Limitations

  • Maximum processing time: 30 minutes
  • Result retention: Results stored for 24 hours
  • Rate limits: Same as synchronous requests