# Thread Context Modes

div
strong
🔨 In Development
 — This section is still being developed and may change.


Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints.

## Available Context Modes

### 🔄 `recent` Mode (Default)

**What it does:** Keeps the most recent messages that fit within the thread context limit, discarding older messages first.


```
Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons)
Limit: 6,000 neurons
Result: [msg5, msg6, msg7, msg8] (5,800 neurons)
```

**Best for:**

- General conversations
- Customer support
- Performance-critical applications
- Cost optimization


**Pros:** Fast, predictable, low cost
**Cons:** Loses conversation beginning

### 🧠 `smart` Mode (Balanced)

**What it does:** Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread.


```
Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons)
        ↑ Start preserved  ↑ Recent context
```

**Best for:**

- Code assistance
- Long conversations
- Complex problem solving
- Educational content


**Pros:** Maintains context setup and recent state
**Cons:** Middle messages discarded, slightly slower

### 📚 `full` Mode (Complete Context)

**What it does:** Keeps all messages regardless of length (until API limits are reached).


```
Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything kept
```

**Best for:**

- Short conversations only
- Critical context requirements
- Research & analysis
- ⚠️ **Warning:** Can exceed API neuron capacity limits!


**Pros:** Complete conversation history
**Cons:** Expensive, slow, can hit limits

## Quick Comparison

| Mode | Speed | Cost | Context | Best For |
|  --- | --- | --- | --- | --- |
| **🔄 `recent`** | ⭐⭐⭐ | ⭐⭐⭐ | Recent only | General use, performance |
| **🧠 `smart`** | ⭐⭐ | ⭐⭐ | Start + Recent | Complex conversations |
| **📚 `full`** | ⭐ | ⭐ | Complete | Critical context, short threads |


## Usage Examples

### Setting Context Mode in API Request


```bash
curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "thread": "thread_conversation_123",
    "threadContextMode": "smart",
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Continue where we left off"}]
      }
    ]
  }'
```

### JavaScript Example


```javascript
async function sendMessageWithContextMode(threadId, message, mode = 'recent') {
  const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4.1',
      thread: threadId,
      threadContextMode: mode,
      inputs: [
        { role: 'user', texts: [{ text: message }] }
      ]
    })
  });
  
  return await response.json();
}

// Use different modes based on scenario
await sendMessageWithContextMode('thread_123', 'Quick question', 'recent');
await sendMessageWithContextMode('thread_456', 'Complex problem', 'smart');
await sendMessageWithContextMode('thread_789', 'Critical analysis', 'full');
```

### Python Example


```python
import requests

def send_with_context_mode(thread_id, message, mode='recent'):
    response = requests.post(
        'https://api.freddy.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'thread': thread_id,
            'threadContextMode': mode,
            'inputs': [
                {'role': 'user', 'texts': [{'text': message}]}
            ]
        }
    )
    return response.json()

# Adaptive mode selection
def adaptive_context_mode(thread_message_count):
    if thread_message_count < 10:
        return 'full'  # Short thread, keep everything
    elif thread_message_count < 50:
        return 'smart'  # Medium thread, balanced approach
    else:
        return 'recent'  # Long thread, optimize for performance
```

## When to Use Each Mode

### Use `recent` When:

- Building a chatbot with many short interactions
- Neuron and synapse costs are a primary concern
- Performance/speed is critical
- Context from the beginning isn't necessary
- Handling high-volume customer support


**Example Scenarios:**

- Customer service chatbot
- Quick Q&A applications
- Status update bots
- Transactional conversations


### Use `smart` When:

- Having multi-step conversations
- Initial context setup matters
- Balance between cost and context is needed
- Users reference earlier conversation points
- Building educational or coaching tools


**Example Scenarios:**

- Code pair programming assistants
- Technical troubleshooting
- Educational tutoring
- Complex customer issues
- Creative writing assistance


### Use `full` When:

- Thread is short (< 20 messages)
- Every detail matters for accuracy
- Cost is not a concern
- Building research or analysis tools
- Legal or compliance requirements


**Example Scenarios:**

- Document analysis
- Legal consultation
- Medical consultation
- Research assistance
- Critical decision support


## Performance Impact

### Neuron Usage


```
Scenario: 100-message thread, each message ~200 neurons

recent mode:
- Keeps last ~15 messages = ~3,000 neurons
- Cost: Low

smart mode:
- Keeps first 3 + last 7 = ~2,000 neurons
- Cost: Medium

full mode:
- Keeps all 100 messages = ~20,000 neurons
- Cost: High
```

### Response Time


```
recent mode: ~1-2 seconds
smart mode: ~1.5-3 seconds
full mode: ~3-10 seconds (depends on total length)
```

## Best Practices

### Start Conservative


```javascript
// Start with recent mode
let contextMode = 'recent';

// Upgrade based on user needs
if (userAsksAboutEarlierTopic) {
  contextMode = 'smart';
}

if (criticalAccuracyRequired) {
  contextMode = 'full';
}
```

### Monitor Thread Length


```python
def get_optimal_mode(thread_id):
    thread_info = get_thread(thread_id)
    message_count = len(thread_info['messages'])
    total_tokens = thread_info['total_tokens']
    
    if total_tokens < 4000:
        return 'full'  # Small enough to keep everything
    elif message_count < 30:
        return 'smart'  # Medium size, balanced approach
    else:
        return 'recent'  # Large thread, optimize
```

### Provide User Control


```javascript
// Let users choose their preference
const userSettings = {
  contextMode: 'smart', // User preference
  autoAdjust: true      // Allow automatic adjustment
};

function getUserContextMode(threadLength) {
  if (!userSettings.autoAdjust) {
    return userSettings.contextMode;
  }
  
  // Auto-adjust based on thread length
  if (threadLength > 100) return 'recent';
  if (threadLength > 20) return 'smart';
  return 'full';
}
```

### Handle Edge Cases


```python
def send_message_safe(thread_id, message, preferred_mode='smart'):
    try:
        response = send_with_context_mode(thread_id, message, preferred_mode)
        return response
    except NeuronLimitError:
        # Fallback to more aggressive mode
        if preferred_mode == 'full':
            return send_with_context_mode(thread_id, message, 'smart')
        elif preferred_mode == 'smart':
            return send_with_context_mode(thread_id, message, 'recent')
        else:
            raise  # Already using most aggressive mode
```

## Recommendations

### Default Strategy

**For most applications, use this progression:**

1. **Start with `recent`** - Fast, cost-effective, works for 80% of use cases
2. **Upgrade to `smart`** - When users reference earlier context or conversations become complex
3. **Reserve `full`** - Only for critical scenarios where every detail matters


### Cost Optimization


```javascript
// Cost-aware mode selection
function selectMode(threadLength, budget) {
  if (budget === 'low') return 'recent';
  if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent';
  if (budget === 'high') return threadLength < 20 ? 'full' : 'smart';
  return 'recent'; // Default to cheapest
}
```

### Quality Optimization


```javascript
// Quality-aware mode selection
function selectModeForQuality(conversationType) {
  const qualityMap = {
    'customer_service': 'recent',
    'technical_support': 'smart',
    'code_assistance': 'smart',
    'legal_consultation': 'full',
    'medical_consultation': 'full',
    'research': 'full'
  };
  
  return qualityMap[conversationType] || 'recent';
}
```

## Limitations

- **Neuron Limits**: Even `full` mode respects API neuron capacity limits
- **Message Count**: Threads with 1000+ messages may still be truncated
- **Performance**: `full` mode can be significantly slower for long threads
- **Cost**: `full` mode can be 5-10x more expensive for long threads


## Related Resources

- [Threads](/docs/documentation/core-concepts/threads)
- [Create Model Response](/docs/api-reference/responses/create)
- [Best Practices](/docs/documentation/best-practices)
- [Rate Limiting](/docs/documentation/rate-limiting)