Extract structured data from a single URL using a JSON schema. ## Headers | Name | Type | Required | Description | | --- | --- | --- | --- | | `Authorization` | string | Yes | Bearer token authentication | | `Content-Type` | string | Yes | Must be `application/json` | ## Request Body | Field | Type | Required | Description | | --- | --- | --- | --- | | `url` | string | Yes | Target URL to scrape | | `schema` | object | Yes | JSON schema defining data structure | | `options` | object | No | Scraping configuration options | ### Schema Object | Field | Type | Required | Description | | --- | --- | --- | --- | | `type` | string | Yes | Must be "object" | | `properties` | object | Yes | Field definitions with types | **Supported Field Types**: `string`, `number`, `boolean`, `array`, `object`, `url`, `date`, `email` ### Options Object | Field | Type | Default | Description | | --- | --- | --- | --- | | `max_items` | integer | 100 | Maximum items to extract (1-1000) | | `timeout` | integer | 30 | Processing timeout in seconds (1-300) | | `sync` | boolean | false | Wait for results (true) or return job ID (false) | | `wait_for_content` | boolean | true | Wait for dynamic content to load | | `extract_images` | boolean | false | Extract image URLs from content | | `follow_pagination` | boolean | false | Follow pagination links | | `llm_mode` | string | "structured" | LLM processing mode: "structured" or "json" | | `date_filter` | object | null | Filter items by date | | `engine_type` | string | null | Force engine: "static", "browser", or "hybrid" | ### Date Filter Object | Field | Type | Required | Description | | --- | --- | --- | --- | | `field` | string | Yes | Date field name in schema | | `after` | string | No | Include items after this date (YYYY-MM-DD) | | `before` | string | No | Include items before this date (YYYY-MM-DD) | ## Response (Synchronous Mode) **Status**: `200 OK` | Field | Type | Description | | --- | --- | --- | | `job_id` | string | Unique job identifier | | `status` | string | Job status: "completed" | | `url` | string | The scraped URL | | `extracted_data` | array | Array of extracted items | | `metadata` | object | Processing metadata | | `created_at` | string | Job creation timestamp (ISO 8601) | | `completed_at` | string | Job completion timestamp (ISO 8601) | ### Metadata Object | Field | Type | Description | | --- | --- | --- | | `processing_time` | number | Processing time in seconds | | `engine_used` | string | Engine used: "static", "browser", or "hybrid" | | `llm_calls` | integer | Number of LLM API calls made | | `cache_hit` | boolean | Whether result was served from cache | | `field_mappings` | object | How schema fields mapped to website fields | | `site_analysis` | object | Site complexity and characteristics | ## Response (Asynchronous Mode) **Status**: `200 OK` | Field | Type | Description | | --- | --- | --- | | `job_id` | string | Unique job identifier | | `status` | string | Job status: "pending" | | `url` | string | The scraped URL | | `extracted_data` | null | Null until completed | | `metadata` | null | Null until completed | | `created_at` | string | Job creation timestamp | | `completed_at` | null | Null until completed | Synchronous Request ```bash curl -X POST https://api.aitronos.com/api/v1/scrape \ -H "X-API-Key: $FREDDY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/products", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"}, "description": {"type": "string"}, "availability": {"type": "string"} } }, "options": { "max_items": 10, "timeout": 30, "sync": true } }' ``` ```python import os import requests api_key = os.environ["FREDDY_API_KEY"] response = requests.post( "https://api.aitronos.com/api/v1/scrape", headers={ "X-API-Key": api_key, "Content-Type": "application/json" }, json={ "url": "https://example.com/products", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"}, "description": {"type": "string"}, "availability": {"type": "string"} } }, "options": { "max_items": 10, "timeout": 30, "sync": True } } ) data = response.json() print(f"Extracted {len(data['extracted_data'])} items") for item in data['extracted_data']: print(f"- {item['title']}: ${item['price']}") ``` ```javascript const axios = require('axios'); const apiKey = process.env.FREDDY_API_KEY; axios.post('https://api.aitronos.com/api/v1/scrape', { url: 'https://example.com/products', schema: { type: 'object', properties: { title: { type: 'string' }, price: { type: 'number' }, description: { type: 'string' }, availability: { type: 'string' } } }, options: { max_items: 10, timeout: 30, sync: true } }, { headers: { 'X-API-Key': apiKey, 'Content-Type': 'application/json' } }) .then(response => { const data = response.data; console.log(`Extracted ${data.extracted_data.length} items`); data.extracted_data.forEach(item => { console.log(`- ${item.title}: ${item.price}`); }); }) .catch(error => { console.error('Error:', error.response.data); }); ``` **Response** `200 OK` ```json { "job_id": "job_abc123def456", "status": "completed", "url": "https://example.com/products", "extracted_data": [ { "title": "Product Name", "price": 29.99, "description": "Product description text", "availability": "In Stock" }, { "title": "Another Product", "price": 49.99, "description": "Another description", "availability": "Out of Stock" } ], "metadata": { "processing_time": 2.5, "engine_used": "browser", "llm_calls": 1, "cache_hit": false, "field_mappings": { "title": { "website_field": "product_name", "confidence": 0.95 }, "price": { "website_field": "cost", "confidence": 0.92 } }, "site_analysis": { "complexity": "medium", "requires_js": true, "estimated_items": 12 } }, "created_at": "2024-12-16T10:30:00Z", "completed_at": "2024-12-16T10:30:02Z" } ``` Asynchronous Request ```bash curl -X POST https://api.aitronos.com/api/v1/scrape \ -H "X-API-Key: $FREDDY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/products", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"} } }, "options": { "sync": false } }' ``` ```python import os import requests api_key = os.environ["FREDDY_API_KEY"] response = requests.post( "https://api.aitronos.com/api/v1/scrape", headers={"X-API-Key": api_key}, json={ "url": "https://example.com/products", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "price": {"type": "number"} } }, "options": {"sync": False} } ) data = response.json() job_id = data['job_id'] print(f"Job created: {job_id}") ``` ```javascript const axios = require('axios'); const apiKey = process.env.FREDDY_API_KEY; axios.post('https://api.aitronos.com/api/v1/scrape', { url: 'https://example.com/products', schema: { type: 'object', properties: { title: { type: 'string' }, price: { type: 'number' } } }, options: { sync: false } }, { headers: { 'X-API-Key': apiKey } }) .then(response => { const jobId = response.data.job_id; console.log(`Job created: ${jobId}`); }); ``` **Response** `200 OK` ```json { "job_id": "job_abc123def456", "status": "pending", "url": "https://example.com/products", "extracted_data": null, "metadata": null, "created_at": "2024-12-16T10:30:00Z", "completed_at": null } ``` With Date Filter ```bash curl -X POST https://api.aitronos.com/api/v1/scrape \ -H "X-API-Key: $FREDDY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/blog", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "created_date": {"type": "date"}, "content": {"type": "string"} } }, "options": { "date_filter": { "field": "created_date", "after": "2024-12-01", "before": "2024-12-31" }, "sync": true } }' ``` ```python import os import requests api_key = os.environ["FREDDY_API_KEY"] response = requests.post( "https://api.aitronos.com/api/v1/scrape", headers={"X-API-Key": api_key}, json={ "url": "https://example.com/blog", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "created_date": {"type": "date"}, "content": {"type": "string"} } }, "options": { "date_filter": { "field": "created_date", "after": "2024-12-01", "before": "2024-12-31" }, "sync": True } } ) ```