POSThttps://api.aitronos.com/v1/scrape
Extract structured data from a single URL using a JSON schema.
| Name | Type | Required | Description |
|---|---|---|---|
Authorization | string | Yes | Bearer token authentication |
Content-Type | string | Yes | Must be application/json |
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Target URL to scrape |
schema | object | Yes | JSON schema defining data structure |
options | object | No | Scraping configuration options |
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Must be "object" |
properties | object | Yes | Field definitions with types |
Supported Field Types: string, number, boolean, array, object, url, date, email
| Field | Type | Default | Description |
|---|---|---|---|
max_items | integer | 100 | Maximum items to extract (1-1000) |
timeout | integer | 30 | Processing timeout in seconds (1-300) |
sync | boolean | false | Wait for results (true) or return job ID (false) |
wait_for_content | boolean | true | Wait for dynamic content to load |
extract_images | boolean | false | Extract image URLs from content |
follow_pagination | boolean | false | Follow pagination links |
llm_mode | string | "structured" | LLM processing mode: "structured" or "json" |
date_filter | object | null | Filter items by date |
engine_type | string | null | Force engine: "static", "browser", or "hybrid" |
| Field | Type | Required | Description |
|---|---|---|---|
field | string | Yes | Date field name in schema |
after | string | No | Include items after this date (YYYY-MM-DD) |
before | string | No | Include items before this date (YYYY-MM-DD) |
Status: 200 OK
| Field | Type | Description |
|---|---|---|
job_id | string | Unique job identifier |
status | string | Job status: "completed" |
url | string | The scraped URL |
extracted_data | array | Array of extracted items |
metadata | object | Processing metadata |
created_at | string | Job creation timestamp (ISO 8601) |
completed_at | string | Job completion timestamp (ISO 8601) |
| Field | Type | Description |
|---|---|---|
processing_time | number | Processing time in seconds |
engine_used | string | Engine used: "static", "browser", or "hybrid" |
llm_calls | integer | Number of LLM API calls made |
cache_hit | boolean | Whether result was served from cache |
field_mappings | object | How schema fields mapped to website fields |
site_analysis | object | Site complexity and characteristics |
Status: 200 OK
| Field | Type | Description |
|---|---|---|
job_id | string | Unique job identifier |
status | string | Job status: "pending" |
url | string | The scraped URL |
extracted_data | null | Null until completed |
metadata | null | Null until completed |
created_at | string | Job creation timestamp |
completed_at | null | Null until completed |
Bash
- Bash
- Python
- JavaScript
curl -X POST https://api.aitronos.com/api/v1/scrape \
-H "X-API-Key: $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"},
"description": {"type": "string"},
"availability": {"type": "string"}
}
},
"options": {
"max_items": 10,
"timeout": 30,
"sync": true
}
}'Response 200 OK
{
"job_id": "job_abc123def456",
"status": "completed",
"url": "https://example.com/products",
"extracted_data": [
{
"title": "Product Name",
"price": 29.99,
"description": "Product description text",
"availability": "In Stock"
},
{
"title": "Another Product",
"price": 49.99,
"description": "Another description",
"availability": "Out of Stock"
}
],
"metadata": {
"processing_time": 2.5,
"engine_used": "browser",
"llm_calls": 1,
"cache_hit": false,
"field_mappings": {
"title": {
"website_field": "product_name",
"confidence": 0.95
},
"price": {
"website_field": "cost",
"confidence": 0.92
}
},
"site_analysis": {
"complexity": "medium",
"requires_js": true,
"estimated_items": 12
}
},
"created_at": "2024-12-16T10:30:00Z",
"completed_at": "2024-12-16T10:30:02Z"
}