Skip to content
Last updated
POSThttps://api.aitronos.com/v1/scrape

Extract structured data from a single URL using a JSON schema.

NameTypeRequiredDescription
AuthorizationstringYesBearer token authentication
Content-TypestringYesMust be application/json

Request Body

FieldTypeRequiredDescription
urlstringYesTarget URL to scrape
schemaobjectYesJSON schema defining data structure
optionsobjectNoScraping configuration options

Schema Object

FieldTypeRequiredDescription
typestringYesMust be "object"
propertiesobjectYesField definitions with types

Supported Field Types: string, number, boolean, array, object, url, date, email

Options Object

FieldTypeDefaultDescription
max_itemsinteger100Maximum items to extract (1-1000)
timeoutinteger30Processing timeout in seconds (1-300)
syncbooleanfalseWait for results (true) or return job ID (false)
wait_for_contentbooleantrueWait for dynamic content to load
extract_imagesbooleanfalseExtract image URLs from content
follow_paginationbooleanfalseFollow pagination links
llm_modestring"structured"LLM processing mode: "structured" or "json"
date_filterobjectnullFilter items by date
engine_typestringnullForce engine: "static", "browser", or "hybrid"

Date Filter Object

FieldTypeRequiredDescription
fieldstringYesDate field name in schema
afterstringNoInclude items after this date (YYYY-MM-DD)
beforestringNoInclude items before this date (YYYY-MM-DD)

Response (Synchronous Mode)

Status: 200 OK

FieldTypeDescription
job_idstringUnique job identifier
statusstringJob status: "completed"
urlstringThe scraped URL
extracted_dataarrayArray of extracted items
metadataobjectProcessing metadata
created_atstringJob creation timestamp (ISO 8601)
completed_atstringJob completion timestamp (ISO 8601)

Metadata Object

FieldTypeDescription
processing_timenumberProcessing time in seconds
engine_usedstringEngine used: "static", "browser", or "hybrid"
llm_callsintegerNumber of LLM API calls made
cache_hitbooleanWhether result was served from cache
field_mappingsobjectHow schema fields mapped to website fields
site_analysisobjectSite complexity and characteristics

Response (Asynchronous Mode)

Status: 200 OK

FieldTypeDescription
job_idstringUnique job identifier
statusstringJob status: "pending"
urlstringThe scraped URL
extracted_datanullNull until completed
metadatanullNull until completed
created_atstringJob creation timestamp
completed_atnullNull until completed
Bash
curl -X POST https://api.aitronos.com/api/v1/scrape \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"},
        "description": {"type": "string"},
        "availability": {"type": "string"}
      }
    },
    "options": {
      "max_items": 10,
      "timeout": 30,
      "sync": true
    }
  }'

Response 200 OK

{
  "job_id": "job_abc123def456",
  "status": "completed",
  "url": "https://example.com/products",
  "extracted_data": [
    {
      "title": "Product Name",
      "price": 29.99,
      "description": "Product description text",
      "availability": "In Stock"
    },
    {
      "title": "Another Product",
      "price": 49.99,
      "description": "Another description",
      "availability": "Out of Stock"
    }
  ],
  "metadata": {
    "processing_time": 2.5,
    "engine_used": "browser",
    "llm_calls": 1,
    "cache_hit": false,
    "field_mappings": {
      "title": {
        "website_field": "product_name",
        "confidence": 0.95
      },
      "price": {
        "website_field": "cost",
        "confidence": 0.92
      }
    },
    "site_analysis": {
      "complexity": "medium",
      "requires_js": true,
      "estimated_items": 12
    }
  },
  "created_at": "2024-12-16T10:30:00Z",
  "completed_at": "2024-12-16T10:30:02Z"
}