Skip to content
Last updated

🔨 In Development — This section is still being developed and may change.
Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints.

Available Context Modes

🔄 recent Mode (Default)

What it does: Keeps the most recent messages that fit within the thread context limit, discarding older messages first.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons)
Limit: 6,000 neurons
Result: [msg5, msg6, msg7, msg8] (5,800 neurons)

Best for:

  • General conversations
  • Customer support
  • Performance-critical applications
  • Cost optimization

Pros: Fast, predictable, low cost
Cons: Loses conversation beginning


🧠 smart Mode (Balanced)

What it does: Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons)
        ↑ Start preserved  ↑ Recent context

Best for:

  • Code assistance
  • Long conversations
  • Complex problem solving
  • Educational content

Pros: Maintains context setup and recent state
Cons: Middle messages discarded, slightly slower


📚 full Mode (Complete Context)

What it does: Keeps all messages regardless of length (until API limits are reached).

Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything kept

Best for:

  • Short conversations only
  • Critical context requirements
  • Research & analysis
  • ⚠️ Warning: Can exceed API neuron capacity limits!

Pros: Complete conversation history
Cons: Expensive, slow, can hit limits


Quick Comparison

ModeSpeedCostContextBest For
🔄 recent⭐⭐⭐⭐⭐⭐Recent onlyGeneral use, performance
🧠 smart⭐⭐⭐⭐Start + RecentComplex conversations
📚 fullCompleteCritical context, short threads

Usage Examples

Setting Context Mode in API Request

curl https://api.freddy.aitronos.com/v1/model/response \
  -H "Authorization: Bearer $FREDDY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "thread": "thread_conversation_123",
    "threadContextMode": "smart",
    "inputs": [
      {
        "role": "user",
        "texts": [{"text": "Continue where we left off"}]
      }
    ]
  }'

JavaScript Example

async function sendMessageWithContextMode(threadId, message, mode = 'recent') {
  const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4.1',
      thread: threadId,
      threadContextMode: mode,
      inputs: [
        { role: 'user', texts: [{ text: message }] }
      ]
    })
  });
  
  return await response.json();
}

// Use different modes based on scenario
await sendMessageWithContextMode('thread_123', 'Quick question', 'recent');
await sendMessageWithContextMode('thread_456', 'Complex problem', 'smart');
await sendMessageWithContextMode('thread_789', 'Critical analysis', 'full');

Python Example

import requests

def send_with_context_mode(thread_id, message, mode='recent'):
    response = requests.post(
        'https://api.freddy.aitronos.com/v1/model/response',
        headers={'Authorization': f'Bearer {api_key}'},
        json={
            'model': 'gpt-4.1',
            'thread': thread_id,
            'threadContextMode': mode,
            'inputs': [
                {'role': 'user', 'texts': [{'text': message}]}
            ]
        }
    )
    return response.json()

# Adaptive mode selection
def adaptive_context_mode(thread_message_count):
    if thread_message_count < 10:
        return 'full'  # Short thread, keep everything
    elif thread_message_count < 50:
        return 'smart'  # Medium thread, balanced approach
    else:
        return 'recent'  # Long thread, optimize for performance

When to Use Each Mode

Use recent When:

  • Building a chatbot with many short interactions
  • Neuron and synapse costs are a primary concern
  • Performance/speed is critical
  • Context from the beginning isn't necessary
  • Handling high-volume customer support

Example Scenarios:

  • Customer service chatbot
  • Quick Q&A applications
  • Status update bots
  • Transactional conversations

Use smart When:

  • Having multi-step conversations
  • Initial context setup matters
  • Balance between cost and context is needed
  • Users reference earlier conversation points
  • Building educational or coaching tools

Example Scenarios:

  • Code pair programming assistants
  • Technical troubleshooting
  • Educational tutoring
  • Complex customer issues
  • Creative writing assistance

Use full When:

  • Thread is short (< 20 messages)
  • Every detail matters for accuracy
  • Cost is not a concern
  • Building research or analysis tools
  • Legal or compliance requirements

Example Scenarios:

  • Document analysis
  • Legal consultation
  • Medical consultation
  • Research assistance
  • Critical decision support

Performance Impact

Neuron Usage

Scenario: 100-message thread, each message ~200 neurons

recent mode:
- Keeps last ~15 messages = ~3,000 neurons
- Cost: Low

smart mode:
- Keeps first 3 + last 7 = ~2,000 neurons
- Cost: Medium

full mode:
- Keeps all 100 messages = ~20,000 neurons
- Cost: High

Response Time

recent mode: ~1-2 seconds
smart mode: ~1.5-3 seconds
full mode: ~3-10 seconds (depends on total length)

Best Practices

Start Conservative

// Start with recent mode
let contextMode = 'recent';

// Upgrade based on user needs
if (userAsksAboutEarlierTopic) {
  contextMode = 'smart';
}

if (criticalAccuracyRequired) {
  contextMode = 'full';
}

Monitor Thread Length

def get_optimal_mode(thread_id):
    thread_info = get_thread(thread_id)
    message_count = len(thread_info['messages'])
    total_tokens = thread_info['total_tokens']
    
    if total_tokens < 4000:
        return 'full'  # Small enough to keep everything
    elif message_count < 30:
        return 'smart'  # Medium size, balanced approach
    else:
        return 'recent'  # Large thread, optimize

Provide User Control

// Let users choose their preference
const userSettings = {
  contextMode: 'smart', // User preference
  autoAdjust: true      // Allow automatic adjustment
};

function getUserContextMode(threadLength) {
  if (!userSettings.autoAdjust) {
    return userSettings.contextMode;
  }
  
  // Auto-adjust based on thread length
  if (threadLength > 100) return 'recent';
  if (threadLength > 20) return 'smart';
  return 'full';
}

Handle Edge Cases

def send_message_safe(thread_id, message, preferred_mode='smart'):
    try:
        response = send_with_context_mode(thread_id, message, preferred_mode)
        return response
    except NeuronLimitError:
        # Fallback to more aggressive mode
        if preferred_mode == 'full':
            return send_with_context_mode(thread_id, message, 'smart')
        elif preferred_mode == 'smart':
            return send_with_context_mode(thread_id, message, 'recent')
        else:
            raise  # Already using most aggressive mode

Recommendations

Default Strategy

For most applications, use this progression:

  1. Start with recent - Fast, cost-effective, works for 80% of use cases
  2. Upgrade to smart - When users reference earlier context or conversations become complex
  3. Reserve full - Only for critical scenarios where every detail matters

Cost Optimization

// Cost-aware mode selection
function selectMode(threadLength, budget) {
  if (budget === 'low') return 'recent';
  if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent';
  if (budget === 'high') return threadLength < 20 ? 'full' : 'smart';
  return 'recent'; // Default to cheapest
}

Quality Optimization

// Quality-aware mode selection
function selectModeForQuality(conversationType) {
  const qualityMap = {
    'customer_service': 'recent',
    'technical_support': 'smart',
    'code_assistance': 'smart',
    'legal_consultation': 'full',
    'medical_consultation': 'full',
    'research': 'full'
  };
  
  return qualityMap[conversationType] || 'recent';
}

Limitations

  • Neuron Limits: Even full mode respects API neuron capacity limits
  • Message Count: Threads with 1000+ messages may still be truncated
  • Performance: full mode can be significantly slower for long threads
  • Cost: full mode can be 5-10x more expensive for long threads