Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.
Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:
- Long-running generations
- Batch processing workflows
- Non-blocking UI experiences
- Queue-based architectures
- Submit Request: Send a POST request with
background: true - Receive ID: Get an immediate response with a unique response ID
- Check Status: Poll the status endpoint to monitor progress
- Retrieve Result: Fetch the completed response when ready
# Submit background request
curl https://api.aitronos.com/v1/model/response \
-H "X-API-Key: $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"background": true,
"inputs": [
{
"role": "user",
"texts": [{"text": "Write a detailed article about AI"}]
}
]
}'Response:
{
"id": "resp_67ccd2bed1ec8190",
"status": "queued",
"created_at": 1741476542
}Poll the status endpoint to monitor progress:
curl https://api.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
-H "X-API-Key: $FREDDY_API_KEY"Response (In Progress):
{
"id": "resp_67ccd2bed1ec8190",
"status": "in_progress",
"created_at": 1741476542
}Response (Completed):
{
"id": "resp_67ccd2bed1ec8190",
"status": "completed",
"created_at": 1741476542,
"completed_at": 1741476550,
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Here is a detailed article about AI..."
}
]
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 450,
"total_tokens": 462
}
}| Status | Description |
|---|---|
queued | Request received and queued for processing |
in_progress | Currently being processed |
completed | Successfully completed |
failed | Processing failed (see error details) |
cancelled | Request was cancelled |
Don't poll too frequently. Recommended intervals:
- Short tasks (<30s expected): Poll every 2-3 seconds
- Medium tasks (30s-5min expected): Poll every 5-10 seconds
- Long tasks (>5min expected): Poll every 30-60 seconds
Set reasonable timeouts based on your use case:
async function waitForCompletion(responseId, maxWaitTime = 300000) {
const startTime = Date.now();
while (Date.now() - startTime < maxWaitTime) {
const status = await checkStatus(responseId);
if (status.status === 'completed') {
return status;
}
if (status.status === 'failed') {
throw new Error('Response failed');
}
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
}
throw new Error('Timeout waiting for response');
}Instead of polling, you can register a webhook URL to receive notifications when processing completes.
Process multiple requests in parallel:
const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];
// Submit all requests
for (const prompt of requests) {
const response = await fetch('/v1/model/response', {
method: 'POST',
headers: { 'X-API-Key': apiKey },
body: JSON.stringify({
model: 'gpt-4.1',
background: true,
inputs: [{ role: 'user', texts: [{ text: prompt }] }]
})
});
const data = await response.json();
responseIds.push(data.id);
}
// Wait for all to complete
const results = await Promise.all(
responseIds.map(id => waitForCompletion(id))
);Keep your UI responsive while processing:
async function generateResponse(prompt) {
// Submit background request
const response = await submitBackgroundRequest(prompt);
// Show loading state with ID
showLoadingMessage(`Processing... (ID: ${response.id})`);
// Poll for completion
const result = await waitForCompletion(response.id);
// Update UI with result
displayResult(result);
}Integrate with your job queue:
from celery import Celery
import requests
app = Celery('tasks')
@app.task
def process_ai_response(prompt):
# Submit background request
response = requests.post(
'https://api.aitronos.com/v1/model/response',
headers={'Authorization': f'Bearer {api_key}'},
json={
'model': 'gpt-4.1',
'background': True,
'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
}
)
response_id = response.json()['id']
# Store ID for status checking
return response_id- Maximum processing time: 30 minutes
- Result retention: Results stored for 24 hours
- Rate limits: Same as synchronous requests