Skip to content
Last updated
POSThttps://api.aitronos.com/v1/scrape/batch

Process multiple URLs in parallel with optimized resource management.

NameTypeRequiredDescription
AuthorizationstringYesBearer token authentication
Content-TypestringYesMust be application/json

Request Body

FieldTypeRequiredDescription
urlsarrayYesList of URLs to scrape (max 50)
schemaobjectYesJSON schema for data extraction
optionsobjectNoBatch processing options

Batch Options

FieldTypeDefaultDescription
max_items_per_urlinteger50Max items per URL (1-500)
max_total_itemsinteger500Max total items across all URLs (1-5000)
parallel_jobsinteger3Number of parallel jobs (1-10)
timeout_per_urlinteger30Timeout per URL in seconds (1-300)
date_filterobjectnullFilter items by date
llm_modestring"structured"LLM processing mode

Response

Status: 200 OK

FieldTypeDescription
batch_idstringUnique batch identifier
total_urlsintegerTotal number of URLs
jobsarrayArray of job objects
progressobjectProgress information
created_atstringBatch creation timestamp

Job Object

FieldTypeDescription
job_idstringUnique job identifier
urlstringTarget URL
statusstringJob status: "pending", "processing", "completed", "failed"

Progress Object

FieldTypeDescription
completedintegerNumber of completed jobs
failedintegerNumber of failed jobs
pendingintegerNumber of pending jobs
total_items_extractedintegerTotal items extracted so far
Bash
curl -X POST https://api.aitronos.com/api/v1/scrape/batch \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "content": {"type": "string"}
      }
    },
    "options": {
      "max_items_per_url": 50,
      "parallel_jobs": 3
    }
  }'

Response 200 OK

{
  "batch_id": "batch_xyz789",
  "total_urls": 3,
  "jobs": [
    {
      "job_id": "job_abc123",
      "url": "https://example.com/page1",
      "status": "pending"
    },
    {
      "job_id": "job_def456",
      "url": "https://example.com/page2",
      "status": "pending"
    },
    {
      "job_id": "job_ghi789",
      "url": "https://example.com/page3",
      "status": "pending"
    }
  ],
  "progress": {
    "completed": 0,
    "failed": 0,
    "pending": 3,
    "total_items_extracted": 0
  },
  "created_at": "2024-12-16T10:30:00Z"
}