🔨 In Development — This section is still being developed and may change.
Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints.
What it does: Keeps the most recent messages that fit within the thread context limit, discarding older messages first.
Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons)
Limit: 6,000 neurons
Result: [msg5, msg6, msg7, msg8] (5,800 neurons)Best for:
- General conversations
- Customer support
- Performance-critical applications
- Cost optimization
Pros: Fast, predictable, low cost
Cons: Loses conversation beginning
What it does: Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread.
Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons)
↑ Start preserved ↑ Recent contextBest for:
- Code assistance
- Long conversations
- Complex problem solving
- Educational content
Pros: Maintains context setup and recent state
Cons: Middle messages discarded, slightly slower
What it does: Keeps all messages regardless of length (until API limits are reached).
Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons)
Limit: 8,000 neurons
Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything keptBest for:
- Short conversations only
- Critical context requirements
- Research & analysis
- ⚠️ Warning: Can exceed API neuron capacity limits!
Pros: Complete conversation history
Cons: Expensive, slow, can hit limits
| Mode | Speed | Cost | Context | Best For |
|---|---|---|---|---|
🔄 recent | ⭐⭐⭐ | ⭐⭐⭐ | Recent only | General use, performance |
🧠 smart | ⭐⭐ | ⭐⭐ | Start + Recent | Complex conversations |
📚 full | ⭐ | ⭐ | Complete | Critical context, short threads |
curl https://api.freddy.aitronos.com/v1/model/response \
-H "Authorization: Bearer $FREDDY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4.1",
"thread": "thread_conversation_123",
"threadContextMode": "smart",
"inputs": [
{
"role": "user",
"texts": [{"text": "Continue where we left off"}]
}
]
}'async function sendMessageWithContextMode(threadId, message, mode = 'recent') {
const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-4.1',
thread: threadId,
threadContextMode: mode,
inputs: [
{ role: 'user', texts: [{ text: message }] }
]
})
});
return await response.json();
}
// Use different modes based on scenario
await sendMessageWithContextMode('thread_123', 'Quick question', 'recent');
await sendMessageWithContextMode('thread_456', 'Complex problem', 'smart');
await sendMessageWithContextMode('thread_789', 'Critical analysis', 'full');import requests
def send_with_context_mode(thread_id, message, mode='recent'):
response = requests.post(
'https://api.freddy.aitronos.com/v1/model/response',
headers={'Authorization': f'Bearer {api_key}'},
json={
'model': 'gpt-4.1',
'thread': thread_id,
'threadContextMode': mode,
'inputs': [
{'role': 'user', 'texts': [{'text': message}]}
]
}
)
return response.json()
# Adaptive mode selection
def adaptive_context_mode(thread_message_count):
if thread_message_count < 10:
return 'full' # Short thread, keep everything
elif thread_message_count < 50:
return 'smart' # Medium thread, balanced approach
else:
return 'recent' # Long thread, optimize for performance- Building a chatbot with many short interactions
- Neuron and synapse costs are a primary concern
- Performance/speed is critical
- Context from the beginning isn't necessary
- Handling high-volume customer support
Example Scenarios:
- Customer service chatbot
- Quick Q&A applications
- Status update bots
- Transactional conversations
- Having multi-step conversations
- Initial context setup matters
- Balance between cost and context is needed
- Users reference earlier conversation points
- Building educational or coaching tools
Example Scenarios:
- Code pair programming assistants
- Technical troubleshooting
- Educational tutoring
- Complex customer issues
- Creative writing assistance
- Thread is short (< 20 messages)
- Every detail matters for accuracy
- Cost is not a concern
- Building research or analysis tools
- Legal or compliance requirements
Example Scenarios:
- Document analysis
- Legal consultation
- Medical consultation
- Research assistance
- Critical decision support
Scenario: 100-message thread, each message ~200 neurons
recent mode:
- Keeps last ~15 messages = ~3,000 neurons
- Cost: Low
smart mode:
- Keeps first 3 + last 7 = ~2,000 neurons
- Cost: Medium
full mode:
- Keeps all 100 messages = ~20,000 neurons
- Cost: Highrecent mode: ~1-2 seconds
smart mode: ~1.5-3 seconds
full mode: ~3-10 seconds (depends on total length)// Start with recent mode
let contextMode = 'recent';
// Upgrade based on user needs
if (userAsksAboutEarlierTopic) {
contextMode = 'smart';
}
if (criticalAccuracyRequired) {
contextMode = 'full';
}def get_optimal_mode(thread_id):
thread_info = get_thread(thread_id)
message_count = len(thread_info['messages'])
total_tokens = thread_info['total_tokens']
if total_tokens < 4000:
return 'full' # Small enough to keep everything
elif message_count < 30:
return 'smart' # Medium size, balanced approach
else:
return 'recent' # Large thread, optimize// Let users choose their preference
const userSettings = {
contextMode: 'smart', // User preference
autoAdjust: true // Allow automatic adjustment
};
function getUserContextMode(threadLength) {
if (!userSettings.autoAdjust) {
return userSettings.contextMode;
}
// Auto-adjust based on thread length
if (threadLength > 100) return 'recent';
if (threadLength > 20) return 'smart';
return 'full';
}def send_message_safe(thread_id, message, preferred_mode='smart'):
try:
response = send_with_context_mode(thread_id, message, preferred_mode)
return response
except NeuronLimitError:
# Fallback to more aggressive mode
if preferred_mode == 'full':
return send_with_context_mode(thread_id, message, 'smart')
elif preferred_mode == 'smart':
return send_with_context_mode(thread_id, message, 'recent')
else:
raise # Already using most aggressive modeFor most applications, use this progression:
- Start with
recent- Fast, cost-effective, works for 80% of use cases - Upgrade to
smart- When users reference earlier context or conversations become complex - Reserve
full- Only for critical scenarios where every detail matters
// Cost-aware mode selection
function selectMode(threadLength, budget) {
if (budget === 'low') return 'recent';
if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent';
if (budget === 'high') return threadLength < 20 ? 'full' : 'smart';
return 'recent'; // Default to cheapest
}// Quality-aware mode selection
function selectModeForQuality(conversationType) {
const qualityMap = {
'customer_service': 'recent',
'technical_support': 'smart',
'code_assistance': 'smart',
'legal_consultation': 'full',
'medical_consultation': 'full',
'research': 'full'
};
return qualityMap[conversationType] || 'recent';
}- Neuron Limits: Even
fullmode respects API neuron capacity limits - Message Count: Threads with 1000+ messages may still be truncated
- Performance:
fullmode can be significantly slower for long threads - Cost:
fullmode can be 5-10x more expensive for long threads