Skip to content
Last updated

Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints.

Available Context Modes

recent Mode (Default)

What it does: Keeps the most recent messages that fit within the thread context limit, discarding older messages first.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons)
Limit: 6,000 neurons
Result: [msg5, msg6, msg7, msg8] (5,800 neurons)

Best for:

  • General conversations
  • Customer support
  • Performance-critical applications
  • Cost optimization

Pros: Fast, predictable, low cost Cons: Loses conversation beginning


smart Mode (Balanced)

What it does: Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons)
 ↑ Start preserved ↑ Recent context

Best for:

  • Code assistance
  • Long conversations
  • Complex problem solving
  • Educational content

Pros: Maintains context setup and recent state Cons: Middle messages discarded, slightly slower


full Mode (Complete Context)

What it does: Keeps all messages regardless of length (until API limits are reached).

Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything kept

Best for:

  • Short conversations only
  • Critical context requirements
  • Research & analysis
  • Warning: Can exceed API neuron capacity limits!

Pros: Complete conversation history Cons: Expensive, slow, can hit limits


Quick Comparison

ModeSpeedCostContextBest For
** recent**Recent onlyGeneral use, performance
** smart**Start + RecentComplex conversations
** full**CompleteCritical context, short threads

Usage Examples

Setting Context Mode in API Request

curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-4o",
 "thread": "thrd_conversation_123",
 "threadContextMode": "smart",
 "inputs": [
 {
 "role": "user",
 "content": "Continue where we left off"
 }
 ]
 }'

JavaScript Example

async function sendMessageWithContextMode(threadId, message, mode = 'recent') {
 const response = await fetch('https://api.aitronos.com/v1/model/response', {
 method: 'POST',
 headers: {
 'X-API-Key': apiKey,
 'Content-Type': 'application/json'
 },
 body: JSON.stringify({
 model: 'gpt-4o',
 thread: threadId,
 threadContextMode: mode,
 inputs: [
 { role: 'user', texts: [{ text: message }] }
 ]
 })
 });

 return await response.json();
}

// Use different modes based on scenario
await sendMessageWithContextMode('thrd_123', 'Quick question', 'recent');
await sendMessageWithContextMode('thrd_456', 'Complex problem', 'smart');
await sendMessageWithContextMode('thrd_789', 'Critical analysis', 'full');

Python Example

import requests

def send_with_context_mode(thread_id, message, mode='recent'):
 response = requests.post(
 'https://api.aitronos.com/v1/model/response',
 headers={'X-API-Key': api_key},
 json={
 'model': 'gpt-4o',
 'thread': thread_id,
 'threadContextMode': mode,
 'inputs': [
 {'role': 'user', 'texts': [{'text': message}]}
 ]
 }
 )
 return response.json()

# Adaptive mode selection
def adaptive_context_mode(thread_message_count):
 if thread_message_count < 10:
 return 'full' # Short thread, keep everything
 elif thread_message_count < 50:
 return 'smart' # Medium thread, balanced approach
 else:
 return 'recent' # Long thread, optimize for performance

When to Use Each Mode

Use recent When:

  • Building a chatbot with many short interactions
  • Neuron and synapse costs are a primary concern
  • Performance/speed is critical
  • Context from the beginning isn't necessary
  • Handling high-volume customer support

Example Scenarios:

  • Customer service chatbot
  • Quick Q&A applications
  • Status update bots
  • Transactional conversations

Use smart When:

  • Having multi-step conversations
  • Initial context setup matters
  • Balance between cost and context is needed
  • Users reference earlier conversation points
  • Building educational or coaching tools

Example Scenarios:

  • Code pair programming assistants
  • Technical troubleshooting
  • Educational tutoring
  • Complex customer issues
  • Creative writing assistance

Use full When:

  • Thread is short (< 20 messages)
  • Every detail matters for accuracy
  • Cost is not a concern
  • Building research or analysis tools
  • Legal or compliance requirements

Example Scenarios:

  • Document analysis
  • Legal consultation
  • Medical consultation
  • Research assistance
  • Critical decision support

Performance Impact

Neuron Usage

Scenario: 100-message thread, each message ~200 neurons

recent mode:
- Keeps last ~15 messages = ~3,000 neurons
- Cost: Low

smart mode:
- Keeps first 3 + last 7 = ~2,000 neurons
- Cost: Medium

full mode:
- Keeps all 100 messages = ~20,000 neurons
- Cost: High

Response Time

recent mode: ~1-2 seconds
smart mode: ~1.5-3 seconds
full mode: ~3-10 seconds (depends on total length)

Best Practices

Start Conservative

// Start with recent mode
let contextMode = 'recent';

// Upgrade based on user needs
if (userAsksAboutEarlierTopic) {
 contextMode = 'smart';
}

if (criticalAccuracyRequired) {
 contextMode = 'full';
}

Monitor Thread Length

def get_optimal_mode(thread_id):
 thread_info = get_thread(thread_id)
 message_count = len(thread_info['messages'])
 total_tokens = thread_info['total_tokens']

 if total_tokens < 4000:
 return 'full' # Small enough to keep everything
 elif message_count < 30:
 return 'smart' # Medium size, balanced approach
 else:
 return 'recent' # Large thread, optimize

Provide User Control

// Let users choose their preference
const userSettings = {
 contextMode: 'smart', // User preference
 autoAdjust: true // Allow automatic adjustment
};

function getUserContextMode(threadLength) {
 if (!userSettings.autoAdjust) {
 return userSettings.contextMode;
 }

 // Auto-adjust based on thread length
 if (threadLength > 100) return 'recent';
 if (threadLength > 20) return 'smart';
 return 'full';
}

Handle Edge Cases

def send_message_safe(thread_id, message, preferred_mode='smart'):
 try:
 response = send_with_context_mode(thread_id, message, preferred_mode)
 return response
 except NeuronLimitError:
 # Fallback to more aggressive mode
 if preferred_mode == 'full':
 return send_with_context_mode(thread_id, message, 'smart')
 elif preferred_mode == 'smart':
 return send_with_context_mode(thread_id, message, 'recent')
 else:
 raise # Already using most aggressive mode

Recommendations

Default Strategy

For most applications, use this progression:

  1. Start with recent - Fast, cost-effective, works for 80% of use cases
  2. Upgrade to smart - When users reference earlier context or conversations become complex
  3. Reserve full - Only for critical scenarios where every detail matters

Cost Optimization

// Cost-aware mode selection
function selectMode(threadLength, budget) {
 if (budget === 'low') return 'recent';
 if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent';
 if (budget === 'high') return threadLength < 20 ? 'full' : 'smart';
 return 'recent'; // Default to cheapest
}

Quality Optimization

// Quality-aware mode selection
function selectModeForQuality(conversationType) {
 const qualityMap = {
 'customer_service': 'recent',
 'technical_support': 'smart',
 'code_assistance': 'smart',
 'legal_consultation': 'full',
 'medical_consultation': 'full',
 'research': 'full'
 };

 return qualityMap[conversationType] || 'recent';
}

Limitations

  • Neuron Limits: Even full mode respects API neuron capacity limits
  • Message Count: Threads with 1000+ messages may still be truncated
  • Performance: full mode can be significantly slower for long threads
  • Cost: full mode can be 5-10x more expensive for long threads