Skip to content
Last updated

Threads provide stateful, persistent conversations that automatically maintain context across multiple model responses.

Overview

A thread is a container for a multi-turn conversation between users and AI assistants. Instead of manually managing conversation history, threads automatically:

  • Store messages: All inputs and outputs are saved
  • Maintain context: Previous messages are automatically included in new requests
  • Persist state: Conversations survive across sessions
  • Enable continuation: Resume conversations anytime with the thread ID

How Threads Work

When you create a model response with a thread:

  1. Thread lookup: Existing messages in the thread are retrieved
  2. Context injection: Thread messages are prepended to your new inputs
  3. Processing: The model sees full conversation history
  4. Auto-save: Your new inputs and outputs are added to the thread
Thread: thread_abc123
├─ Message 1: "Hello" (user)
├─ Message 2: "Hi! How can I help?" (assistant)
├─ Message 3: "What's the weather?" (user) ← New request
└─ Message 4: "I'll check that for you..." (assistant) ← Auto-saved

Creating a Thread

Option 1: Auto-Create with String ID

Pass a thread ID string. If it doesn't exist, it will be created:

curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "thread": "thread_user123_session1",
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Hello, I need help with Python"}]
      }
    ]
  }'

Option 2: Create with Metadata

Pass an object to create a thread with metadata:

{
  "model": "gpt-4.1",
  "thread": {
    "id": "thread_user123_session1",
    "metadata": {
      "userId": "user_123",
      "sessionId": "sess_456",
      "topic": "python_help"
    }
  },
  "inputs": [
    {
      "role": "user",
      "texts": [{"text": "Hello, I need help with Python"}]
    }
  ]
}

Option 3: Thread-First Creation

Create an empty thread first, then use it:

# Create thread
curl https://api.aitronos.com/v1/threads \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "userId": "user_123",
      "purpose": "support_chat"
    }
  }'

# Use thread in responses
curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "thread": "thread_67cc8901",
    "inputs": [...]
  }'

Continuing a Conversation

Simply reference the same thread ID:

// First message
const response1 = await fetch('/v1/model/response', {
  method: 'POST',
  headers: { 'X-API-Key': apiKey },
  body: JSON.stringify({
    model: 'gpt-4.1',
    thread: 'thread_chat_001',
    inputs: [{ role: 'user', texts: [{ text: 'What is Python?' }] }]
  })
});

// Continue conversation (history is automatic)
const response2 = await fetch('/v1/model/response', {
  method: 'POST',
  headers: { 'X-API-Key': apiKey },
  body: JSON.stringify({
    model: 'gpt-4.1',
    thread: 'thread_chat_001', // Same thread ID
    inputs: [{ role: 'user', texts: [{ text: 'Can you show an example?' }] }]
  })
});
// Model automatically sees the previous Q&A about Python

Thread States

Threads have a lifecycle state that controls how they can be used:

Available States

open (default)

  • Accepts new messages and responses
  • Thread can be modified and extended
  • Use for active conversations

locked

  • Read-only mode - no new messages allowed
  • Existing messages remain accessible
  • Useful for preserving completed conversations
  • Can be unlocked back to open

archived

  • Preserved but inactive
  • Cannot add messages (permanent)
  • Still accessible for reading
  • Use for long-term storage of resolved conversations

Setting Thread State

# Lock a thread to prevent modifications
curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "thread": "thread_abc123",
    "threadState": "locked",
    "inputs": [...]
  }'

# Archive a completed conversation
curl https://api.aitronos.com/v1/threads/thread_abc123 \
  -X PATCH \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -d '{"state": "archived"}'

State Transitions

open ──────────────────> locked
 ↑                          ↓
 └──────── unlock ──────────┘

open ──────────────────> archived (permanent)
locked ────────────────> archived (permanent)

Use Cases

Lock threads when:

  • Conversation is complete but may need reference
  • Compliance requires immutable records
  • Support ticket is resolved but not ready for archival
  • Quality review is in progress

Archive threads when:

  • Conversation will never be continued
  • Long-term storage is needed
  • Freeing up active thread capacity
  • Compliance retention period begins

Thread Visibility

Control whether threads appear in the user's thread list using the store parameter when creating model responses.

Visible Threads (Default)

By default, all threads are visible in the list threads API (store=true):

curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "organization_id": "org_abc123",
    "model": "ftg-3.0",
    "store": true,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Hello"}]
      }
    ]
  }'

Hidden Threads

Create threads that don't appear in the list threads API by setting store=false:

curl https://api.aitronos.com/v1/model/response \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "organization_id": "org_abc123",
    "model": "ftg-3.0-speed",
    "store": false,
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Generate a title for this conversation"}]
      }
    ]
  }'

Hidden threads are:

  • ✅ Created and stored in the database
  • ✅ Accessible by direct thread ID lookup
  • ✅ Fully functional for all thread operations
  • ❌ Not visible in the list threads API (by default)
  • ❌ Not shown in the user interface

Use Cases for Hidden Threads

Internal Operations

# Generate thread name without creating visible thread
response = requests.post(
    'https://api.aitronos.com/v1/model/response',
    headers={'Authorization': f'Bearer {api_key}'},
    json={
        'organization_id': 'org_abc123',
        'model': 'ftg-3.0-speed',
        'store': False,  # Hidden thread
        'inputs': [
            {'role': 'user', 'texts': [{'text': 'Generate a title'}]}
        ]
    }
)

Background Tasks

// Process data without cluttering thread list
const response = await fetch('/v1/model/response', {
  method: 'POST',
  body: JSON.stringify({
    organization_id: 'org_abc123',
    model: 'ftg-3.0',
    store: false, // Hidden thread
    inputs: [
      { role: 'user', texts: [{ text: 'Analyze this dataset' }] }
    ]
  })
});

Temporary Threads

# Create ephemeral thread for one-time operation
def generate_summary(content):
    response = requests.post(
        'https://api.aitronos.com/v1/model/response',
        json={
            'organization_id': 'org_abc123',
            'model': 'ftg-3.0',
            'store': False,  # Won't appear in user's thread list
            'inputs': [
                {'role': 'user', 'texts': [{'text': f'Summarize: {content}'}]}
            ]
        }
    )
    return response.json()

Accessing Hidden Threads

Hidden threads can still be accessed directly by ID:

# Get hidden thread by ID
curl https://api.aitronos.com/v1/threads/thread_abc123 \
  -H "X-API-Key: $FREDDY_API_KEY"

# List hidden threads explicitly
curl https://api.aitronos.com/v1/threads?visible_in_ui=false \
  -H "X-API-Key: $FREDDY_API_KEY"

Best Practices

Use store=true (default) for:

  • User-facing conversations
  • Customer support chats
  • Multi-turn dialogues
  • Conversations that need to be resumed

Use store=false for:

  • Internal operations (name generation, summaries)
  • Background processing
  • Temporary or ephemeral threads
  • System-level tasks
  • Automated workflows

Stateless Conversations

For scenarios where you don't want persistent threads, use previousResponseId to chain responses:

// First message
const response1 = await fetch('/v1/model/response', {
  method: 'POST',
  body: JSON.stringify({
    model: 'gpt-4.1',
    inputs: [{ role: 'user', texts: [{ text: 'Hello' }] }]
  })
});

const data1 = await response1.json();

// Continue without thread - just reference previous response
const response2 = await fetch('/v1/model/response', {
  method: 'POST',
  body: JSON.stringify({
    model: 'gpt-4.1',
    previousResponseId: data1.id, // Link to previous response
    inputs: [{ role: 'user', texts: [{ text: 'Tell me more' }] }]
  })
});
// Model sees context from response1 without creating a persistent thread

When to use previousResponseId:

  • Short, ephemeral conversations
  • Zero data retention requirements
  • Single-session workflows
  • Testing without polluting thread storage

Limitations:

  • Cannot reference responses older than 24 hours
  • No persistent history (only previous response)
  • Cannot be combined with thread parameter

Thread Management

Retrieve Thread Messages

curl https://api.aitronos.com/v1/threads/thread_67cc8901/messages \
  -H "X-API-Key: $FREDDY_API_KEY"

Update Thread Metadata

curl https://api.aitronos.com/v1/threads/thread_67cc8901 \
  -X PATCH \
  -H "X-API-Key: $FREDDY_API_KEY" \
  -d '{
    "metadata": {
      "status": "resolved",
      "rating": 5
    }
  }'

Delete Thread

curl https://api.aitronos.com/v1/threads/thread_67cc8901 \
  -X DELETE \
  -H "X-API-Key: $FREDDY_API_KEY"

Use Cases

Customer Support Chat

def handle_support_message(user_id, message):
    thread_id = f"support_{user_id}"
    
    response = requests.post(
        'https://api.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'thread': {
                'id': thread_id,
                'metadata': {
                    'userId': user_id,
                    'department': 'support',
                    'priority': 'normal'
                }
            },
            'inputs': [
                {'role': 'user', 'texts': [{'text': message}]}
            ]
        }
    )
    return response.json()

Multi-Session Conversations

class ConversationManager {
  constructor(userId) {
    this.threadId = `user_${userId}_${Date.now()}`;
  }
  
  async sendMessage(text) {
    const response = await fetch('/v1/model/response', {
      method: 'POST',
      body: JSON.stringify({
        model: 'gpt-4.1',
        thread: this.threadId,
        inputs: [{ role: 'user', texts: [{ text }] }]
      })
    });
    return await response.json();
  }
  
  getThreadId() {
    return this.threadId;
  }
}

// Usage
const chat = new ConversationManager('user_123');
await chat.sendMessage('Hello');
await chat.sendMessage('Tell me more'); // Remembers "Hello" context

Guided Workflows

def onboarding_workflow(user_id, step, user_input):
    thread_id = f"onboarding_{user_id}"
    
    # System prompt sets the context for the thread
    system_message = {
        'role': 'system',
        'texts': [{'text': 'You are helping a user through an onboarding process. Current step: ' + step}]
    }
    
    response = requests.post(
        'https://api.aitronos.com/v1/model/response',
        json={
            'model': 'gpt-4.1',
            'thread': thread_id,
            'inputs': [
                system_message,
                {'role': 'user', 'texts': [{'text': user_input}]}
            ]
        }
    )
    
    return response.json()

Thread Limits

  • Maximum messages per thread: 10,000
  • Message retention: 90 days of inactivity
  • Max thread age: 1 year
  • Metadata size: 16KB per thread

Best Practices

Use Descriptive Thread IDs

// ✅ Good - descriptive and unique
const threadId = `support_${userId}_${ticketId}`;
const threadId = `chat_${sessionId}_${timestamp}`;

// ❌ Bad - too generic or not unique
const threadId = 'thread1';
const threadId = userId; // May conflict across contexts

Add Meaningful Metadata

{
  "thread": {
    "id": "support_user123_ticket789",
    "metadata": {
      "userId": "user_123",
      "ticketId": "ticket_789",
      "category": "billing",
      "priority": "high",
      "assignedTo": "agent_45",
      "created": "2025-01-06T10:30:00Z"
    }
  }
}

Implement Thread Cleanup

Periodically delete old or resolved threads:

def cleanup_old_threads():
    # Get threads older than 90 days
    old_threads = get_threads(older_than=90)
    
    for thread_id in old_threads:
        requests.delete(f'/v1/threads/{thread_id}')

Handle Thread Not Found

async function sendMessage(threadId, message) {
  try {
    const response = await fetch('/v1/model/response', {
      method: 'POST',
      body: JSON.stringify({
        model: 'gpt-4.1',
        thread: threadId,
        inputs: [{ role: 'user', texts: [{ text: message }] }]
      })
    });
    
    if (response.status === 404) {
      // Thread was deleted or expired - create new one
      return await sendMessage(createNewThreadId(), message);
    }
    
    return await response.json();
  } catch (error) {
    console.error('Thread error:', error);
  }
}

Threads vs Manual History

With Threads (Automatic)

// Simple - thread handles history
await sendMessage(threadId, 'Hello');
await sendMessage(threadId, 'How are you?'); // Context automatic

Without Threads (Manual)

// Complex - must track history yourself
let history = [];

async function sendMessage(text) {
  const userMsg = { role: 'user', texts: [{ text }] };
  history.push(userMsg);
  
  const response = await fetch('/v1/model/response', {
    body: JSON.stringify({
      model: 'gpt-4.1',
      inputs: history // Must send full history each time
    })
  });
  
  const result = await response.json();
  history.push(result.output[0]); // Must save assistant response
  return result;
}