Skip to content
Last updated
POSThttps://api.aitronos.com/v1/scrape

Extract structured data from a single URL using a JSON schema.

NameTypeRequiredDescription
AuthorizationstringYesBearer token authentication
Content-TypestringYesMust be application/json

Request Body

url string required

Target URL to scrape.

schema object required

JSON schema defining the data structure to extract.

organization_id string optional

Organization ID for billing and access control.

options object optional

Scraping configuration options.

FieldTypeRequiredDescription
urlstringYesTarget URL to scrape
schemaobjectYesJSON schema defining data structure
optionsobjectNoScraping configuration options

Schema Object

FieldTypeRequiredDescription
typestringYesMust be "object"
propertiesobjectYesField definitions with types

Supported Field Types: string, number, boolean, array, object, url, date, email

Options Object

FieldTypeDefaultDescription
max_itemsinteger100Maximum items to extract (1-1000)
timeoutinteger30Processing timeout in seconds (1-300)
syncbooleanfalseWait for results (true) or return job ID (false)
wait_for_contentbooleantrueWait for dynamic content to load
extract_imagesbooleanfalseExtract image URLs from content
follow_paginationbooleanfalseFollow pagination links
llm_modestring"structured"LLM processing mode: "structured" or "json"
date_filterobjectnullFilter items by date
engine_typestringnullForce engine: "static", "browser", or "hybrid"

Date Filter Object

FieldTypeRequiredDescription
fieldstringYesDate field name in schema
afterstringNoInclude items after this date (YYYY-MM-DD)
beforestringNoInclude items before this date (YYYY-MM-DD)

Response (Synchronous Mode)

Status: 200 OK

FieldTypeDescription
job_idstringUnique job identifier
statusstringJob status: "completed"
urlstringThe scraped URL
extracted_dataarrayArray of extracted items
metadataobjectProcessing metadata
created_atstringJob creation timestamp (ISO 8601)
completed_atstringJob completion timestamp (ISO 8601)

Metadata Object

FieldTypeDescription
processing_timenumberProcessing time in seconds
engine_usedstringEngine used: "static", "browser", or "hybrid"
llm_callsintegerNumber of LLM API calls made
cache_hitbooleanWhether result was served from cache
field_mappingsobjectHow schema fields mapped to website fields
site_analysisobjectSite complexity and characteristics

Response (Asynchronous Mode)

Status: 200 OK

FieldTypeDescription
job_idstringUnique job identifier
statusstringJob status: "pending"
urlstringThe scraped URL
extracted_datanullNull until completed
metadatanullNull until completed
created_atstringJob creation timestamp
completed_atnullNull until completed

Returns

Returns a JSON response indicating success or failure.

Bash
curl -X POST https://api.aitronos.com/v1/scrape \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "schema": {
      "type": "object",
      "properties": {
        "title": {"type": "string"},
        "price": {"type": "number"},
        "description": {"type": "string"},
        "availability": {"type": "string"}
      }
    },
    "options": {
      "max_items": 10,
      "timeout": 30,
      "sync": true
    }
  }'

Response 200 OK

{
  "job_id": "job_abc123def456",
  "status": "completed",
  "url": "https://example.com/products",
  "extracted_data": [
    {
      "title": "Product Name",
      "price": 29.99,
      "description": "Product description text",
      "availability": "In Stock"
    },
    {
      "title": "Another Product",
      "price": 49.99,
      "description": "Another description",
      "availability": "Out of Stock"
    }
  ],
  "metadata": {
    "processing_time": 2.5,
    "engine_used": "browser",
    "llm_calls": 1,
    "cache_hit": false,
    "field_mappings": {
      "title": {
        "website_field": "product_name",
        "confidence": 0.95
      },
      "price": {
        "website_field": "cost",
        "confidence": 0.92
      }
    },
    "site_analysis": {
      "complexity": "medium",
      "requires_js": true,
      "estimated_items": 12
    }
  },
  "created_at": "2024-12-16T10:30:00Z",
  "completed_at": "2024-12-16T10:30:02Z"
}