# Extract document data Extract structured data from a single document by providing a file and a JSON schema. #### Request Body **`file`** file required Document file to process. Supported formats: PDF, DOCX, XLSX, JPEG, PNG, GIF, BMP, TIFF. Max size: 50 MB. **`schema`** string required JSON schema as string defining the structure of data to extract. **`organization_id`** string required Your organization ID. **`prompt`** string optional Custom extraction instructions to guide the AI. Example: "Focus on extracting line items from the table". **`model`** string optional · Defaults to `ftg-3.0` Model selection: `ftg-3.0`, `gpt-4o`, or `gpt-4o-mini`. **`vision_model`** string optional · Defaults to `gpt-5` Vision analysis model for processing images and PDFs. Used for OCR and visual understanding. **`sync`** boolean optional · Defaults to `false` Process synchronously (true) or asynchronously (false). **`include_raw_text`** boolean optional · Defaults to `false` Include extracted text in response. ## Returns Returns a job object with extraction status and detailed confidence metrics. In synchronous mode, includes extracted data matching your schema. In asynchronous mode, returns job ID for status polling. Python ```python import requests import json API_URL = "https://api.aitronos.com/v1/documents/extract" TOKEN = "your_bearer_token_here" headers = { "Authorization": f"Bearer {TOKEN}" } # Define schema schema = { "properties": { "invoice_number": {"type": "string"}, "date": {"type": "string"}, "total_amount": {"type": "number"}, "vendor_name": {"type": "string"} }, "required": ["invoice_number", "total_amount"] } # Prepare request files = { "file": open("invoice.pdf", "rb") } data = { "schema": json.dumps(schema), "organization_id": "org_abc123", "sync": "true", "model": "gpt-4o-mini" } # Extract data response = requests.post(API_URL, headers=headers, files=files, data=data) result = response.json() if result['success'] and result['status'] == 'completed': print(f"Invoice: {result['extracted_data']['invoice_number']}") print(f"Total: ${result['extracted_data']['total_amount']}") print(f"Confidence: {result['confidence']:.2%}") print(f"Cost: CHF {result['cost_chf']:.4f}") else: print(f"Error: {result.get('error_message', 'Unknown error')}") ``` JavaScript ```javascript const axios = require('axios'); const FormData = require('form-data'); const fs = require('fs'); const API_URL = 'https://api.aitronos.com/v1/documents/extract'; const TOKEN = 'your_bearer_token_here'; const headers = { 'Authorization': `Bearer ${TOKEN}` }; // Define schema const schema = { properties: { invoice_number: { type: 'string' }, date: { type: 'string' }, total_amount: { type: 'number' }, vendor_name: { type: 'string' } }, required: ['invoice_number', 'total_amount'] }; // Prepare form data const form = new FormData(); form.append('file', fs.createReadStream('invoice.pdf')); form.append('schema', JSON.stringify(schema)); form.append('organization_id', 'org_abc123'); form.append('sync', 'true'); form.append('model', 'gpt-4o-mini'); // Extract data axios.post(API_URL, form, { headers: { ...headers, ...form.getHeaders() } }).then(response => { const result = response.data; if (result.success && result.status === 'completed') { console.log(`Invoice: ${result.extracted_data.invoice_number}`); console.log(`Total: $${result.extracted_data.total_amount}`); console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`); console.log(`Cost: CHF ${result.cost_chf.toFixed(4)}`); } else { console.error(`Error: ${result.error_message || 'Unknown error'}`); } }).catch(error => { console.error('Request failed:', error.response?.data || error.message); }); ``` Bash ```bash curl -X POST https://api.aitronos.com/v1/documents/extract \ -H "X-API-Key: $FREDDY_API_KEY" \ -F "file=@invoice.pdf" \ -F 'schema={"properties":{"invoice_number":{"type":"string"},"total_amount":{"type":"number"}},"required":["invoice_number","total_amount"]}' \ -F "organization_id=org_abc123" \ -F "sync=true" \ -F "model=gpt-4o-mini" ``` ## Response Examples ### Successful Extraction (Synchronous) ```json { "success": true, "job_id": "job_abc123def456", "status": "completed", "extracted_data": { "invoice_number": "INV-2024-001", "date": "2024-12-16", "total_amount": 1250.00, "vendor_name": "Acme Corporation" }, "confidence": 0.95, "processing_time": 2.3, "cost_chf": 0.015, "model_used": "gpt-4o", "created_at": "2024-12-16T10:30:00Z", "completed_at": "2024-12-16T10:30:02Z" } ``` ### Job Submitted (Asynchronous) ```json { "success": true, "job_id": "job_abc123def456", "status": "pending", "extracted_data": null, "confidence": null, "processing_time": null, "cost_chf": null, "model_used": null, "created_at": "2024-12-16T10:30:00Z", "completed_at": null } ``` ### Error Response ```json { "success": false, "error": { "code": "INVALID_FILE_TYPE", "message": "The uploaded file type is not supported. Please upload a PDF, Word document, Excel file, or image.", "system_message": "Unsupported file type: .txt", "type": "validation_error", "status": 422, "details": { "file_type": "txt", "supported_types": ["pdf", "docx", "xlsx", "jpg", "png", "gif", "bmp", "tiff"] }, "trace_id": "trace_abc123", "timestamp": "2024-12-16T10:30:00Z" } } ``` ## Related Resources - [Batch Document Extraction](/docs/api-reference/documents/extract-batch) - [Get Job Status](/docs/api-reference/documents/get-job-status) - [Document Extraction Overview](/docs/api-reference/documents/introduction)