Skip to content
Last updated

Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.

Overview

Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:

  • Long-running generations
  • Batch processing workflows
  • Non-blocking UI experiences
  • Queue-based architectures

How It Works

  1. Submit Request: Send a POST request with background: true
  2. Receive ID: Get an immediate response with a unique response ID
  3. Check Status: Poll the status endpoint to monitor progress
  4. Retrieve Result: Fetch the completed response when ready

Basic Example

# Submit background request
curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-4o",
 "background": true,
 "inputs": [
 {
 "role": "user",
 "content": "Write a detailed article about AI"
 }
 ]
 }'

Response:

{
 "id": "resp_67ccd2bed1ec8190",
 "status": "queued",
 "created_at": 1741476542
}

Checking Status

Poll the status endpoint to monitor progress:

curl https://api.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
 -H "X-API-Key: $FREDDY_API_KEY"

Response (In Progress):

{
 "id": "resp_67ccd2bed1ec8190",
 "status": "in_progress",
 "created_at": 1741476542
}

Response (Completed):

{
 "id": "resp_67ccd2bed1ec8190",
 "status": "completed",
 "created_at": 1741476542,
 "completed_at": 1741476550,
 "output": [
 {
 "type": "message",
 "role": "assistant",
 "content": [
 {
 "type": "output_text",
 "text": "Here is a detailed article about AI..."
 }
 ]
 }
 ],
 "usage": {
 "input_tokens": 12,
 "output_tokens": 450,
 "total_tokens": 462
 }
}

Status Values

StatusDescription
queuedRequest received and queued for processing
in_progressCurrently being processed
completedSuccessfully completed
failedProcessing failed (see error details)
cancelledRequest was cancelled

Best Practices

Polling Interval

Don't poll too frequently. Recommended intervals:

  • Short tasks (<30s expected): Poll every 2-3 seconds
  • Medium tasks (30s-5min expected): Poll every 5-10 seconds
  • Long tasks (>5min expected): Poll every 30-60 seconds

Timeout Handling

Set reasonable timeouts based on your use case:

async function waitForCompletion(responseId, maxWaitTime = 300000) {
 const startTime = Date.now();

 while (Date.now() - startTime < maxWaitTime) {
 const status = await checkStatus(responseId);

 if (status.status === 'completed') {
 return status;
 }

 if (status.status === 'failed') {
 throw new Error('Response failed');
 }

 await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
 }

 throw new Error('Timeout waiting for response');
}

Webhooks

As an alternative to polling, you can register a webhook URL to receive notifications when processing completes. See Webhooks for configuration details.

Use Cases

Batch Processing

Process multiple requests in parallel:

const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];

// Submit all requests
for (const prompt of requests) {
 const response = await fetch('/v1/model/response', {
 method: 'POST',
 headers: { 'X-API-Key': apiKey },
 body: JSON.stringify({
 model: 'gpt-4o',
 background: true,
 inputs: [{ role: 'user', texts: [{ text: prompt }] }]
 })
 });
 const data = await response.json();
 responseIds.push(data.id);
}

// Wait for all to complete
const results = await Promise.all(
 responseIds.map(id => waitForCompletion(id))
);

Non-Blocking UI

Keep your UI responsive while processing:

async function generateResponse(prompt) {
 // Submit background request
 const response = await submitBackgroundRequest(prompt);

 // Show loading state with ID
 showLoadingMessage(`Processing... (ID: ${response.id})`);

 // Poll for completion
 const result = await waitForCompletion(response.id);

 // Update UI with result
 displayResult(result);
}

Queue-Based Architecture

Integrate with your job queue:

from celery import Celery
import requests

app = Celery('tasks')

@app.task
def process_ai_response(prompt):
 # Submit background request
 response = requests.post(
 'https://api.aitronos.com/v1/model/response',
 headers={'X-API-Key': api_key},
 json={
 'model': 'gpt-4o',
 'background': True,
 'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
 }
 )
 response_id = response.json()['id']

 # Store ID for status checking
 return response_id

Limitations

  • Maximum processing time: 30 minutes
  • Result retention: Results stored for 24 hours
  • Rate limits: Same as synchronous requests