Skip to content
Last updated
POSThttps://api.aitronos.com/v1/scrape/batch

Process multiple URLs in parallel with optimized resource management.

NameTypeRequiredDescription
AuthorizationstringYesBearer token authentication
Content-TypestringYesMust be application/json

Request Body

urls array required

List of URLs to scrape (max 50).

schema object required

JSON schema for data extraction.

organization_id string optional

Organization ID for billing and access control.

options object optional

Batch processing options.

FieldTypeRequiredDescription
urlsarrayYesList of URLs to scrape (max 50)
schemaobjectYesJSON schema for data extraction
optionsobjectNoBatch processing options

Batch Options

FieldTypeDefaultDescription
max_items_per_urlinteger50Max items per URL (1-500)
max_total_itemsinteger500Max total items across all URLs (1-5000)
parallel_jobsinteger3Number of parallel jobs (1-10)
timeout_per_urlinteger30Timeout per URL in seconds (1-300)
date_filterobjectnullFilter items by date
llm_modestring"structured"LLM processing mode

Response

Status: 200 OK

FieldTypeDescription
batch_idstringUnique batch identifier
total_urlsintegerTotal number of URLs
jobsarrayArray of job objects
progressobjectProgress information
created_atstringBatch creation timestamp

Job Object

FieldTypeDescription
job_idstringUnique job identifier
urlstringTarget URL
statusstringJob status: "pending", "processing", "completed", "failed"

Progress Object

FieldTypeDescription
completedintegerNumber of completed jobs
failedintegerNumber of failed jobs
pendingintegerNumber of pending jobs
total_items_extractedintegerTotal items extracted so far

Returns

Returns a JSON response indicating success or failure.

Bash
curl -X POST https://api.aitronos.com/v1/scrape/batch \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "content": {"type": "string"}
      }
    },
    "options": {
      "max_items_per_url": 50,
      "parallel_jobs": 3
    }
  }'

Response 200 OK

{
  "batch_id": "batch_xyz789",
  "total_urls": 3,
  "jobs": [
    {
      "job_id": "job_abc123",
      "url": "https://example.com/page1",
      "status": "pending"
    },
    {
      "job_id": "job_def456",
      "url": "https://example.com/page2",
      "status": "pending"
    },
    {
      "job_id": "job_ghi789",
      "url": "https://example.com/page3",
      "status": "pending"
    }
  ],
  "progress": {
    "completed": 0,
    "failed": 0,
    "pending": 3,
    "total_items_extracted": 0
  },
  "created_at": "2024-12-16T10:30:00Z"
}