Skip to content

Create a model response

Creates a model response for AI-powered text generation, structured outputs, and tool-augmented workflows.

POSThttps://api.aitronos.com/v1/model/response

Provide text or image inputs to generate text or JSON outputs. Have the model call your own custom code or use built-in tools like web search or file search.

Request Body

Core Parameters

organization_id string optional

The unique identifier of the organization to which this request belongs. All API requests must be scoped to an organization for billing, access control, and resource management. Find your organization ID in Freddy โ†’ Settings โ†’ Organization.

assistant_id string optional

ID of the assistant to use for this response. When provided, automatically applies the assistant's configured rules, instructions, and settings. Enables consistent AI behavior across conversations with priority-based rule application and context management. Learn more about assistants

disable_rules boolean optional ยท Defaults to false

Disable all rule application for this response. When true, no rules will be applied regardless of other rule settings. This parameter is only available when using API keys and cannot be used with bearer tokens (user authentication). Learn more about rules

skill_ids array optional

Target a specific set of skills for this single request. When provided, ONLY these skills are injected into the system prompt โ€” overrides both the user's full library and any skills attached to the assistant. Accepts personal skill IDs (skill_...), organization skill IDs (skill_...), and platform skill names (platform-...). Skills the user cannot access are silently skipped. Pass an empty list to inject no skills. Unset/null preserves the default resolution (assistant attachments โ†’ full library).

model string optional ยท Defaults to ftg-3.0

The AI model to use for generating the response. Aitronos supports various models with different capabilities, performance characteristics, and pricing. Choose based on your task requirements: reasoning models for complex problems, fast models for simple queries, or vision models for image understanding. View available models โ†’

auto_generate_name boolean optional

When true, the backend auto-generates a concise title for the thread after the turn completes โ€” but only while the thread is still unnamed (it never overwrites a user-set title).

usage_context object optional

Attribution context for billing. Lets a usage/cost line be traced back to its product origin (source), sub-feature, AI mode, and the entity/field it acted on.

voice_lean boolean optional

Enable low-latency lean mode. Trims the system prompt to its essentials and keeps only a few core tools loaded up front (the rest stay reachable on demand) to reduce time-to-first-token. Off by default.

inputs array required

Array of input message objects. Each message contains a role and content (text string or structured content blocks), with optional file attachments. View full object โ†’

Show properties

Properties

role string required

The role of the message input. One of user, system, or assistant.

content string | array required

The message content. Accepts either a plain text string or an array of content blocks.

Show content block structure

Each content block:

type string required

Content type: text or file.

text string optional

Text content. Required when type is text.

file_id string optional

File ID reference (file_ prefix). Required when type is file.

files array optional

Array of file references for context retrieval. Upload files first via the Files API, then reference them here by file_id. The system retrieves relevant content from documents and injects it into the conversation. Images (PNG, JPEG, WebP) are sent as visual input to vision-capable models. Maximum 10 files per message.

Show structure

Each file reference:

file_id string required

ID of an uploaded file (e.g., file_abc123def456abc123def456abc12345).

Example: [{ "file_id": "file_abc123def456abc123def456abc12345" }]

id string optional

The unique ID of the input message. Populated when items are returned via API.

Thread and Context

thread_id string optional

The ID of the thread that this response belongs to. Items from this thread are prepended to inputs for this response request. Input items and output items from this response are automatically added to this thread after completion. Learn more

If no thread_id is provided, a new thread is automatically created for this request. The response will include the new thread ID, which can be used in subsequent requests to maintain conversation context and history. This enables seamless multi-turn conversations without manual thread management. For stateless multi-turn conversations, use previous_response_id instead.

store boolean optional ยท Defaults to true

Controls whether the created thread is visible in the user's thread list. When true (default), the thread appears in the list threads API and is visible in the user interface. When false, the thread is hidden from the list threads API but remains accessible by direct thread ID lookup. This is useful for internal operations, background tasks, or temporary threads that shouldn't clutter the user's conversation history. Learn more

space_id string optional

ID of the space to associate the thread with. When provided, new threads are automatically added to the specified space. The space's dedicated vector store and any additional attached vector stores are merged into the conversation's knowledge sources, enabling shared context across all threads in the space. This is idempotent โ€” passing the same space_id on subsequent messages to the same thread has no effect. Learn more about spaces

instructions string optional

System-level instructions that define the model's behavior and capabilities. These instructions override any previous system messages when continuing from a prior response, allowing you to dynamically adjust the model's context without carrying over old system prompts.

Response Delivery

stream boolean optional ยท Defaults to false

Enable streaming mode to receive response data incrementally as it's generated, using server-sent events (SSE). When enabled, the model sends partial outputs in real-time rather than waiting for the complete response. Ideal for user-facing applications requiring immediate feedback. Learn more

suppress_assistant boolean optional ยท Defaults to false

When true, the posted user message is persisted and fanned out to everyone present on the thread (via the message.created live event) but no assistant turn runs โ€” no model call, no response, no SSE stream. Use it for a human-to-human @user aside in a shared/group thread: every participant sees the message, but the assistant does not reply. The flag wins even when stream is true. The endpoint returns a small JSON acknowledgement { id, thread_id, created_by_user_id, mention_user_ids, suppressed: true } instead of a response. When false (default), behaviour is unchanged and a normal assistant turn runs.

mention_user_ids array optional

User IDs this message is addressed to (the @user mention targets). Stored on the message and echoed on the message.created live event and on the fetched message object (see List Messages) so clients can render a "โ†’ to {name}" affordance. An empty or omitted list means no mention (normal addressing). This does not by itself suppress the assistant โ€” pair it with suppress_assistant: true for a human-only aside.

output_mode string optional ยท Defaults to text

Controls the format of the model's response output. Determines how the response content is structured and returned.

Show available modes
  • text - Rich text with markdown formatting (default). Natural language responses with full markdown support.
  • plain - Plain text with all markdown formatting stripped. Useful when you need raw text without any formatting.
  • json - JSON output. The model returns structured JSON. Useful for data extraction and API integrations.
  • structured - Structured output mode. Returns typed content blocks alongside a response_blocks array for programmatic processing.

include array optional

Specify additional data to include in the response. Each value expands specific parts of the output with extra information. Learn more

Show possible types

Available now:

  • all - Include all currently available additional data. Significantly increases response size.
  • web_search.sources - Include source URLs and metadata from web search results
  • tools.available - Include list of tools that were available to the model during response generation
  • tools.used - Include list of tools that were actually used by the model during response generation
  • usage - Include basic token usage (input_tokens, output_tokens, total_tokens)
  • usage.detailed - Include synapse and neuron usage breakdown by component
  • rules - Include list of rules that were applied to this response (rule IDs, names, metadata)
  • rules.debug - Include detailed rule application metadata (which rules were considered, applied, compressed, or filtered)

Coming soon:

  • code_interpreter.outputs - Python execution outputs and generated files
  • file_search.results - Matched document chunks and relevance scores
  • message.input_images - Full image URLs from user messages
  • message.output_images - Generated image URLs from assistant responses
  • message.logprobs - Token-level probability scores for generated content
  • reasoning.encrypted - Encrypted reasoning data for stateless conversations
  • request.logs - Detailed request processing logs

Limits and Controls

max_output_synapses integer optional

Maximum number of synapses the model can generate in the response, including both visible output and internal reasoning synapses. Use this to control response length and computational cost. Learn more

metadata object optional

Custom key-value pairs (up to 16) for attaching structured information to the response. Useful for tracking, categorization, or filtering responses in your application. Keys must be โ‰ค64 characters, values โ‰ค512 characters.

previous_response_id string optional

Reference a previous response to create multi-turn conversations while maintaining context. The model will use the outputs and state from the referenced response as the starting point for this new response. Cannot be used together with thread. Learn more

vector_store_ids array optional

Vector store IDs for file search/RAG. Enables searching uploaded files to answer queries.

edit_message_id string optional

Message ID to edit. Soft-deletes this message and all subsequent messages, then creates a new branch.

previous_message_id string optional

Message ID to branch from. Soft-deletes all messages after this message (keeping it), then continues from that point.

workflow_id string optional

Workflow/Flow ID from FlowPlate. Used to associate threads with specific workflow automations.

Skills

Skills are reusable, named instruction blocks (summarize, translate, review-code, etc.). All skill execution happens through this endpoint โ€” there is no separate "execute skill" endpoint. There are two ways a skill can run during a response:

  1. Catalog + autonomous invocation (default). The server injects a compact skill catalog (one line per skill) into the system prompt. The model sees the catalog and can call the invoke_skill system tool when the conversation matches a skill's purpose. The tool resolves the skill and returns its instructions as a tool result.
  2. Forced server-side injection. Set force_skill_id to make the server resolve and inject a specific skill's content as a system message for this turn. Strict: invalid IDs return HTTP 422. Use this when the client knows exactly which skill should run (e.g. a user clicked a "Use this skill" button or typed a slash command in your UI).

Skills with disable_model_invocation: true are hidden from path 1 (the catalog) โ€” they only run via path 2 (explicit force_skill_id).

skill_ids array optional

Explicit list of skill IDs to scope the catalog for this request. When set, only these skills appear in the catalog and are valid targets for invoke_skill. Accepts personal or org skill IDs (skill_*) and platform skill names (platform-*). Unknown or inaccessible IDs are silently skipped (permissive). An empty list [] opts out of skill injection entirely. Omit (null) to fall back to the assistant's attached skills, then the user's full library.

force_skill_id string optional

When set, the server resolves this skill (with empty arguments) and injects its content as a system message for this turn only, forcing the model to follow the skill's instructions. Accepts a personal or org skill ID (skill_*) or a platform skill name (platform-*). Strict: an invalid or inaccessible ID returns HTTP 422. Works alongside skill_ids โ€” you can force one skill AND scope the catalog to a subset.

Reasoning

reasoning object optional

Unified reasoning configuration that works across all reasoning-capable models (OpenAI GPT-5/O-series, Anthropic Claude). Controls how much computational effort the model dedicates to internal chain-of-thought processing before generating the final output. The API automatically maps your effort level to provider-specific parameters. Learn more

Show properties

effort string optional ยท Defaults to medium

Controls the computational effort spent on reasoning/thinking. Higher effort produces more thorough analysis but increases response time and token usage.

Available values: off, low, medium, high, maximum

View provider mapping details โ†’

summary string optional

Request a summary of the model's reasoning process.

Available values:

  • off (disable reasoning summary)
  • auto (model decides)
  • concise (brief overview)
  • detailed (comprehensive explanation)

Note: Some models do not stream reasoning content in real-time, only metadata events (response.reasoning.started, response.reasoning.completed). Other models stream full thinking content as it's generated via response.reasoning.delta.

Tools

tools array optional

Array of custom tools the model can invoke during response generation. Supports custom function calls, personal connectors, and MCP server tools. Tool usage can be controlled via the tool_choice parameter. For built-in capabilities like web search, file search, and code execution, use the system_tools parameter instead. Learn more

Show possible types
Personal Connectors

Access to external services and APIs through configured personal connectors. Enables the model to interact with services like Gmail, Google Calendar, Dropbox, and more. Learn more about personal connectors โ†’

Personal Connectors object

Show properties

type string required

The type of tool. Always set to personalConnector.

configuration_ids array of strings optional

Specific personal connector configuration IDs to use. If not provided, all enabled configurations for the authenticated user or API key will be available. Each string should be a valid configuration ID (e.g., pconf_abc123).

Custom MCP

Connect to Model Context Protocol (MCP) servers to access additional tools and capabilities. MCP enables Claude to interact with external services, databases, APIs, and custom integrations through standardized protocols. Learn more

MCP Tool object

Show properties

type string required

The type of tool. Always set to mcp.

server_label string required

A unique label identifying this MCP server connection. Used to distinguish between multiple MCP servers in a single request and appears in tool calls and logs. Example: "database_server", "api_integration".

Server Connection

Choose one of the following methods to connect to an MCP server:

configuration_id string optional

ID of a saved MCP configuration. Use this to reference pre-configured connectors that have been set up in advance. See MCP Configurations API

server_url string optional

Direct URL to a custom MCP server endpoint. Use this for connecting to your own MCP-compatible servers or third-party MCP services. Example: "https://mcp.example.com/v1".

Note: One of server_url or configuration_id must be provided (but not both).

Authentication

authorization string optional

OAuth access token or API key for authenticating with the remote MCP server. Required when using custom MCP servers via server_url or when using service connectors that require additional authentication. Your application handles the OAuth flow and provides the token here.

headers object optional

Custom HTTP headers to send with requests to the MCP server. Used for authentication, API versioning, or other server-specific requirements. Example: {"X-API-Key": "your-key", "Authorization": "Bearer token"}.

Tool Filtering

allowed_tools array or object optional

Restrict which tools from the MCP server the model can invoke. By default, all available tools are accessible. Use this to limit access for security or to focus the model on specific capabilities.

Show examples

Array format - List specific tool names:

["search_documents", "list_files", "get_file_content"]

Filter object - Use patterns to include/exclude tools:

{
  "include": ["search_*", "get_*"],
  "exclude": ["delete_*", "update_*"]
}
Custom Function Calls

Custom functions defined by you that the model can call with strongly-typed arguments. Enables the model to interact with your application code, APIs, or external services. Learn more

Function object

Show properties

type string required

The type of tool. Always set to function.

name string required

Unique identifier for the function. Used by the model to reference and invoke the function. Must be alphanumeric with underscores (e.g., get_weather, calculate_total).

description string optional

Human-readable explanation of what the function does. The model uses this to determine when and how to call the function. Be specific and clear.

parameters object required

JSON Schema object defining the function's input parameters. Specifies parameter names, types, descriptions, and whether they're required.

strict boolean optional ยท Defaults to true

Enforce strict parameter validation. When true, the model guarantees parameters match the schema exactly. When false, allows best-effort parameter generation.

System Tools

system_tools object optional

Configuration for built-in system tools provided by Aitronos. These extend model capabilities beyond the custom tools in the tools array. System tools are enabled/disabled via mode settings rather than being defined inline.

Show available system tools

image_operations object optional

Enable AI image generation and editing capabilities. When enabled, the model can generate and edit images; the underlying image engine is selected automatically by Aitronos. Learn more

Show properties

mode string required

Controls when image operations are available. Values: on (always enable), off (disable), auto (model decides based on context).

When enabled, the model can call the generate_image tool with these parameters:

ParameterTypeDefaultDescription
promptstringrequiredText description of the desired image
ninteger1Number of images (1-10)
sizestring1024x1024Image dimensions: 1024x1024, 1024x1536, 1536x1024, auto
qualitystringhdQuality: standard, hd, low, medium, high, auto
stylestringvividStyle: vivid, natural
output_formatstringpngFormat: png, webp, jpeg
backgroundstringautoBackground: transparent, opaque, auto
input_imagestring-URL or base64 for editing mode
input_maskstring-Mask for inpainting
input_fidelitystringlowFidelity: low, high
userstring-User ID for tracking

The underlying image model and engine are selected automatically by Aitronos based on the requested operation (generation vs. editing), image quality, and your organization's available capabilities โ€” you do not pick a raw provider or model. Quality and format hints above are honored where the selected engine supports them.

web_search object optional

Enable web search capabilities.

Show properties

mode string required

Values: on, off, auto.

deep_search object optional

Opt in to reading full page content from web search results (instead of only snippets). Slower and more expensive โ€” off by default. When enabled, the model is allowed to request fetch_content=true per call and the backend fetches the top N result URLs using a scraping engine that also handles JavaScript-rendered pages.

Show properties

enabled boolean ยท Defaults to false

Master opt-in. When false, web_search returns snippet-only results (current behavior). When true, the model sees extra fetch_content / max_pages parameters on the web_search tool and decides per call whether to fetch.

max_pages integer ยท Defaults to 3 ยท Range 1โ€“5

Maximum number of result URLs to fetch per web_search call. The LLM-supplied max_pages argument is always clamped to this value.

render_js string ยท Defaults to auto

JS-rendering policy: auto (fast static fetch first, browser fallback for JS-heavy pages), always (always use the browser engine), or never (static-only).

per_page_timeout_ms integer ยท Defaults to 8000 ยท Range 1000โ€“30000

Hard cap per URL fetch in milliseconds.

max_chars_per_page integer ยท Defaults to 6000 ยท Range 500โ€“50000

Each fetched page is truncated to this many characters before being injected into the LLM context.

code_interpreter object optional

Enable code execution capabilities.

Show properties

mode string required

Values: on, off, auto.

file_search object optional

Enable file search capabilities.

Show properties

mode string required

Values: on, off, auto.

Example:

{
  "inputs": [{"role": "user", "content": "Generate an image of a sunset over mountains"}],
  "model": "ftg-3.0",
  "system_tools": {
    "image_operations": {"mode": "on"}
  }
}

Output Format

Response Parameters

temperature number optional

Controls output randomness and creativity. Values range from 0.0 to 1.0. Lower values (e.g., 0.2) produce focused, deterministic outputs ideal for factual tasks. Higher values (e.g., 0.8) generate more varied, creative responses suited for brainstorming. Adjust either temperature or top_p, not both.

top_p number optional ยท Defaults to 1.0

Alternative to temperature for controlling randomness via nucleus sampling. Only the top cumulative probability mass up to top_p is considered. For example, 0.9 means only the top 90% probability mass is used. Values range from 0.0 to 1.0. Adjust either top_p or temperature, not both.


Body parameters

assistant_name string optional

The assistant name.

frequency_penalty number optional

The frequency penalty.

deep_search_config object optional

The deep search config.

partial_context object optional

Partial state from an interrupted stream, sent by the frontend to enable recovery.

parallel_tool_calls boolean optional

The parallel tool calls.

service_tier string optional

The service tier.

thread_context_mode string optional

The thread context mode.

assistant_instructions string optional

The assistant instructions.

assistant_description string optional

The assistant description.

max_completion_tokens integer optional

The max completion tokens.

slice_ids array[string] optional

The slice ids.

prefer_deep_search boolean optional

The prefer deep search.

max_tool_calls integer optional

The max tool calls.

is_incognito boolean optional

Enable incognito mode. Thread will not be saved, personal memories will not be extracted or used. Ignored if thread_id references an existing thread.

presence_penalty number optional

The presence penalty.

Returns

Non-streaming (stream: false): Returns a complete Response object after generation finishes.

Streaming (stream: true): Returns Streaming event objects as Server-Sent Events (SSE). Each event contains a specific event type and data payload showing the model's progress, tool calls, and generated content in real-time.

Bash
curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "organization_id": "org_abc123",
    "model": "ftg-3.0",
    "inputs": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ]
  }'

Response:

Complete Response object returned after generation finishes:

{
  "success": true,
  "thread_id": "thrd_xyz789",
  "response": [
    {
      "type": "text",
      "text": "The capital of France is Paris. It is located in the north-central part of the country and is known for its art, culture, and iconic landmarks like the Eiffel Tower."
    }
  ],
  "response_id": "resp_abc123",
  "is_summarized": false,
  "stop_reason": "end_turn"
}

Working with Files and Images

The Freddy API supports attaching files and images to conversations. Upload files first, then reference them by file_id.

File References (All Models)

Upload files via the Files API, then attach them to messages using the files array. The system automatically handles each file type:

  • Documents (PDF, DOCX, XLSX, TXT, MD, etc.): Text is extracted and relevant chunks are injected into the conversation context.
  • Images (PNG, JPEG, WebP): Sent as visual input to vision-capable models (GPT-4o, Claude, etc.). The model can see and describe the image.
  • Code files (PY, JS, TS, HTML, CSS, etc.): Content is extracted and provided as context.

Workflow:

  1. Upload file via Files API โ†’ receive file_id
  2. Reference file_id in the files array on the input message
  3. Model receives the content as context (text) or visual input (images)
{
  "inputs": [{
    "role": "user",
    "content": "What do you see in this image?",
    "files": [{ "file_id": "file_abc123def456abc123def456abc12345" }]
  }]
}

Limits:

  • Max 10 files per message
  • Max file size: 100MB (direct upload), unlimited (resumable upload)

View file upload documentation โ†’

Vision API for Structured Data

Extract structured data from images with JSON schema validation using a dedicated endpoint.

Use cases:

  • Invoice processing
  • Receipt scanning
  • ID card extraction
  • Form data capture

View Vision API documentation โ†’