🔨 In Development — This section is still being developed and may change.
Run model responses asynchronously in the background, allowing your application to continue without waiting for completion.
Background mode enables you to submit a model response request and receive an immediate response with a tracking ID, while the actual processing happens asynchronously. This is ideal for:
- Long-running generations
- Batch processing workflows
- Non-blocking UI experiences
- Queue-based architectures
- Submit Request: Send a POST request with
background: true - Receive ID: Get an immediate response with a unique response ID
- Check Status: Poll the status endpoint to monitor progress
- Retrieve Result: Fetch the completed response when ready
# Submit background request
curl https://api.freddy.aitronos.com/v1/model/response \
-H "Authorization: Bearer $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"background": true,
"inputs": [
{
"role": "user",
"texts": [{"text": "Write a detailed article about AI"}]
}
]
}'Response:
{
"id": "resp_67ccd2bed1ec8190",
"status": "queued",
"created_at": 1741476542
}Poll the status endpoint to monitor progress:
curl https://api.freddy.aitronos.com/v1/model/response/resp_67ccd2bed1ec8190 \
-H "Authorization: Bearer $FREDDY_API_KEY"Response (In Progress):
{
"id": "resp_67ccd2bed1ec8190",
"status": "in_progress",
"created_at": 1741476542
}Response (Completed):
{
"id": "resp_67ccd2bed1ec8190",
"status": "completed",
"created_at": 1741476542,
"completed_at": 1741476550,
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Here is a detailed article about AI..."
}
]
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 450,
"total_tokens": 462
}
}| Status | Description |
|---|---|
queued | Request received and queued for processing |
in_progress | Currently being processed |
completed | Successfully completed |
failed | Processing failed (see error details) |
cancelled | Request was cancelled |
Don't poll too frequently. Recommended intervals:
- Short tasks (<30s expected): Poll every 2-3 seconds
- Medium tasks (30s-5min expected): Poll every 5-10 seconds
- Long tasks (>5min expected): Poll every 30-60 seconds
Set reasonable timeouts based on your use case:
async function waitForCompletion(responseId, maxWaitTime = 300000) {
const startTime = Date.now();
while (Date.now() - startTime < maxWaitTime) {
const status = await checkStatus(responseId);
if (status.status === 'completed') {
return status;
}
if (status.status === 'failed') {
throw new Error('Response failed');
}
await new Promise(resolve => setTimeout(resolve, 5000)); // Wait 5s
}
throw new Error('Timeout waiting for response');
}Instead of polling, you can register a webhook URL to receive notifications when processing completes.
Process multiple requests in parallel:
const requests = ['prompt1', 'prompt2', 'prompt3'];
const responseIds = [];
// Submit all requests
for (const prompt of requests) {
const response = await fetch('/v1/model/response', {
method: 'POST',
headers: { 'Authorization': `Bearer ${apiKey}` },
body: JSON.stringify({
model: 'gpt-4.1',
background: true,
inputs: [{ role: 'user', texts: [{ text: prompt }] }]
})
});
const data = await response.json();
responseIds.push(data.id);
}
// Wait for all to complete
const results = await Promise.all(
responseIds.map(id => waitForCompletion(id))
);Keep your UI responsive while processing:
async function generateResponse(prompt) {
// Submit background request
const response = await submitBackgroundRequest(prompt);
// Show loading state with ID
showLoadingMessage(`Processing... (ID: ${response.id})`);
// Poll for completion
const result = await waitForCompletion(response.id);
// Update UI with result
displayResult(result);
}Integrate with your job queue:
from celery import Celery
import requests
app = Celery('tasks')
@app.task
def process_ai_response(prompt):
# Submit background request
response = requests.post(
'https://api.freddy.aitronos.com/v1/model/response',
headers={'Authorization': f'Bearer {api_key}'},
json={
'model': 'gpt-4.1',
'background': True,
'inputs': [{'role': 'user', 'texts': [{'text': prompt}]}]
}
)
response_id = response.json()['id']
# Store ID for status checking
return response_id- Maximum processing time: 30 minutes
- Result retention: Results stored for 24 hours
- Rate limits: Same as synchronous requests