Thread Context Modes
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints.

Available Context Modes

`recent` Mode (Default)

What it does: Keeps the most recent messages that fit within the thread context limit, discarding older messages first.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons)
Limit: 6,000 neurons
Result: [msg5, msg6, msg7, msg8] (5,800 neurons)

Best for:

General conversations
Customer support
Performance-critical applications
Cost optimization

Pros: Fast, predictable, low cost Cons: Loses conversation beginning

`smart` Mode (Balanced)

What it does: Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread.

Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons)
 ↑ Start preserved ↑ Recent context

Best for:

Code assistance
Long conversations
Complex problem solving
Educational content

Pros: Maintains context setup and recent state Cons: Middle messages discarded, slightly slower

`full` Mode (Complete Context)

What it does: Keeps all messages regardless of length (until API limits are reached).

Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything kept

Best for:

Short conversations only
Critical context requirements
Research & analysis
Warning: Can exceed API neuron capacity limits!

Pros: Complete conversation history Cons: Expensive, slow, can hit limits

Quick Comparison

Mode	Context	Best For
`recent`	Recent only	General use, performance
`smart`	Start + Recent	Complex conversations
`full`	Complete	Critical context, short threads

Usage Examples

Setting Context Mode in API Request

curl https://api.aitronos.com/v1/model/response \
 -H "X-API-Key: $FREDDY_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "gpt-4o",
 "thread": "thrd_conversation_123",
 "threadContextMode": "smart",
 "inputs": [
 {
 "role": "user",
 "content": "Continue where we left off"
 }
 ]
 }'

JavaScript Example

async function sendMessageWithContextMode(threadId, message, mode = 'recent') {
 const response = await fetch('https://api.aitronos.com/v1/model/response', {
 method: 'POST',
 headers: {
 'X-API-Key': apiKey,
 'Content-Type': 'application/json'
 },
 body: JSON.stringify({
 model: 'gpt-4o',
 thread: threadId,
 threadContextMode: mode,
 inputs: [
 { role: 'user', texts: [{ text: message }] }
 ]
 })
 });

 return await response.json();
}

// Use different modes based on scenario
await sendMessageWithContextMode('thrd_123', 'Quick question', 'recent');
await sendMessageWithContextMode('thrd_456', 'Complex problem', 'smart');
await sendMessageWithContextMode('thrd_789', 'Critical analysis', 'full');

Python Example

import requests

def send_with_context_mode(thread_id, message, mode='recent'):
 response = requests.post(
 'https://api.aitronos.com/v1/model/response',
 headers={'X-API-Key': api_key},
 json={
 'model': 'gpt-4o',
 'thread': thread_id,
 'threadContextMode': mode,
 'inputs': [
 {'role': 'user', 'texts': [{'text': message}]}
 ]
 }
 )
 return response.json()

# Adaptive mode selection
def adaptive_context_mode(thread_message_count):
 if thread_message_count < 10:
 return 'full' # Short thread, keep everything
 elif thread_message_count < 50:
 return 'smart' # Medium thread, balanced approach
 else:
 return 'recent' # Long thread, optimize for performance

When to Use Each Mode

Use `recent` When:

Building a chatbot with many short interactions
Neuron and synapse costs are a primary concern
Performance/speed is critical
Context from the beginning isn't necessary
Handling high-volume customer support

Example Scenarios:

Customer service chatbot
Quick Q&A applications
Status update bots
Transactional conversations

Use `smart` When:

Having multi-step conversations
Initial context setup matters
Balance between cost and context is needed
Users reference earlier conversation points
Building educational or coaching tools

Example Scenarios:

Code pair programming assistants
Technical troubleshooting
Educational tutoring
Complex customer issues
Creative writing assistance

Use `full` When:

Thread is short (< 20 messages)
Every detail matters for accuracy
Cost is not a concern
Building research or analysis tools
Legal or compliance requirements

Example Scenarios:

Document analysis
Legal consultation
Medical consultation
Research assistance
Critical decision support

Performance Impact

Neuron Usage

Scenario: 100-message thread, each message ~200 neurons

recent mode:
- Keeps last ~15 messages = ~3,000 neurons
- Cost: Low

smart mode:
- Keeps first 3 + last 7 = ~2,000 neurons
- Cost: Medium

full mode:
- Keeps all 100 messages = ~20,000 neurons
- Cost: High

Response Time

recent mode: ~1-2 seconds
smart mode: ~1.5-3 seconds
full mode: ~3-10 seconds (depends on total length)

Best Practices

Start Conservative

// Start with recent mode
let contextMode = 'recent';

// Upgrade based on user needs
if (userAsksAboutEarlierTopic) {
 contextMode = 'smart';
}

if (criticalAccuracyRequired) {
 contextMode = 'full';
}

Monitor Thread Length

def get_optimal_mode(thread_id):
 thread_info = get_thread(thread_id)
 message_count = len(thread_info['messages'])
 total_tokens = thread_info['total_tokens']

 if total_tokens < 4000:
 return 'full' # Small enough to keep everything
 elif message_count < 30:
 return 'smart' # Medium size, balanced approach
 else:
 return 'recent' # Large thread, optimize

Provide User Control

// Let users choose their preference
const userSettings = {
 contextMode: 'smart', // User preference
 autoAdjust: true // Allow automatic adjustment
};

function getUserContextMode(threadLength) {
 if (!userSettings.autoAdjust) {
 return userSettings.contextMode;
 }

 // Auto-adjust based on thread length
 if (threadLength > 100) return 'recent';
 if (threadLength > 20) return 'smart';
 return 'full';
}

Handle Edge Cases

def send_message_safe(thread_id, message, preferred_mode='smart'):
 try:
 response = send_with_context_mode(thread_id, message, preferred_mode)
 return response
 except NeuronLimitError:
 # Fallback to more aggressive mode
 if preferred_mode == 'full':
 return send_with_context_mode(thread_id, message, 'smart')
 elif preferred_mode == 'smart':
 return send_with_context_mode(thread_id, message, 'recent')
 else:
 raise # Already using most aggressive mode

Recommendations

Default Strategy

For most applications, use this progression:

Start with recent - Fast, cost-effective, works for 80% of use cases
Upgrade to smart - When users reference earlier context or conversations become complex
Reserve full - Only for critical scenarios where every detail matters

Cost Optimization

// Cost-aware mode selection
function selectMode(threadLength, budget) {
 if (budget === 'low') return 'recent';
 if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent';
 if (budget === 'high') return threadLength < 20 ? 'full' : 'smart';
 return 'recent'; // Default to cheapest
}

Quality Optimization

// Quality-aware mode selection
function selectModeForQuality(conversationType) {
 const qualityMap = {
 'customer_service': 'recent',
 'technical_support': 'smart',
 'code_assistance': 'smart',
 'legal_consultation': 'full',
 'medical_consultation': 'full',
 'research': 'full'
 };

 return qualityMap[conversationType] || 'recent';
}

Limitations

Neuron Limits: Even full mode respects API neuron capacity limits
Message Count: Threads with 1000+ messages may still be truncated
Performance: full mode can be significantly slower for long threads
Cost: full mode can be 5-10x more expensive for long threads

Thread Context ModesCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude

Available Context Modes

recent Mode (Default)

smart Mode (Balanced)

full Mode (Complete Context)

Quick Comparison

Usage Examples

Setting Context Mode in API Request

JavaScript Example

Python Example

When to Use Each Mode

Use recent When:

Use smart When:

Use full When:

Performance Impact

Neuron Usage

Response Time

Best Practices

Start Conservative

Monitor Thread Length

Provide User Control

Handle Edge Cases

Recommendations

Default Strategy

Cost Optimization

Quality Optimization

Limitations

Related Resources

Was this helpful?

Thread Context Modes
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

`recent` Mode (Default)

`smart` Mode (Balanced)

`full` Mode (Complete Context)

Use `recent` When:

Use `smart` When:

Use `full` When: