# Thread Context Modes div strong 🔨 In Development — This section is still being developed and may change. Freddy's intelligent context management automatically handles conversation history when threads exceed neuron capacity limits. Each mode uses a different strategy to preserve the most important parts of conversations while staying within model constraints. ## Available Context Modes ### 🔄 `recent` Mode (Default) **What it does:** Keeps the most recent messages that fit within the thread context limit, discarding older messages first. ``` Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8] (10,000 neurons) Limit: 6,000 neurons Result: [msg5, msg6, msg7, msg8] (5,800 neurons) ``` **Best for:** - General conversations - Customer support - Performance-critical applications - Cost optimization **Pros:** Fast, predictable, low cost **Cons:** Loses conversation beginning ### 🧠 `smart` Mode (Balanced) **What it does:** Preserves both the conversation beginning (first 3 messages) and recent context (last 7 messages), creating a balanced view of the thread. ``` Original: [msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] (12,000 neurons) Limit: 8,000 neurons Result: [msg1, msg2, msg3, msg8, msg9, msg10] (7,500 neurons) ↑ Start preserved ↑ Recent context ``` **Best for:** - Code assistance - Long conversations - Complex problem solving - Educational content **Pros:** Maintains context setup and recent state **Cons:** Middle messages discarded, slightly slower ### 📚 `full` Mode (Complete Context) **What it does:** Keeps all messages regardless of length (until API limits are reached). ``` Original: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) Limit: 8,000 neurons Result: [msg1, msg2, msg3, msg4, msg5] (5,000 neurons) ← Everything kept ``` **Best for:** - Short conversations only - Critical context requirements - Research & analysis - ⚠️ **Warning:** Can exceed API neuron capacity limits! **Pros:** Complete conversation history **Cons:** Expensive, slow, can hit limits ## Quick Comparison | Mode | Speed | Cost | Context | Best For | | --- | --- | --- | --- | --- | | **🔄 `recent`** | ⭐⭐⭐ | ⭐⭐⭐ | Recent only | General use, performance | | **🧠 `smart`** | ⭐⭐ | ⭐⭐ | Start + Recent | Complex conversations | | **📚 `full`** | ⭐ | ⭐ | Complete | Critical context, short threads | ## Usage Examples ### Setting Context Mode in API Request ```bash curl https://api.freddy.aitronos.com/v1/model/response \ -H "Authorization: Bearer $FREDDY_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4.1", "thread": "thread_conversation_123", "threadContextMode": "smart", "inputs": [ { "role": "user", "texts": [{"text": "Continue where we left off"}] } ] }' ``` ### JavaScript Example ```javascript async function sendMessageWithContextMode(threadId, message, mode = 'recent') { const response = await fetch('https://api.freddy.aitronos.com/v1/model/response', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gpt-4.1', thread: threadId, threadContextMode: mode, inputs: [ { role: 'user', texts: [{ text: message }] } ] }) }); return await response.json(); } // Use different modes based on scenario await sendMessageWithContextMode('thread_123', 'Quick question', 'recent'); await sendMessageWithContextMode('thread_456', 'Complex problem', 'smart'); await sendMessageWithContextMode('thread_789', 'Critical analysis', 'full'); ``` ### Python Example ```python import requests def send_with_context_mode(thread_id, message, mode='recent'): response = requests.post( 'https://api.freddy.aitronos.com/v1/model/response', headers={'Authorization': f'Bearer {api_key}'}, json={ 'model': 'gpt-4.1', 'thread': thread_id, 'threadContextMode': mode, 'inputs': [ {'role': 'user', 'texts': [{'text': message}]} ] } ) return response.json() # Adaptive mode selection def adaptive_context_mode(thread_message_count): if thread_message_count < 10: return 'full' # Short thread, keep everything elif thread_message_count < 50: return 'smart' # Medium thread, balanced approach else: return 'recent' # Long thread, optimize for performance ``` ## When to Use Each Mode ### Use `recent` When: - Building a chatbot with many short interactions - Neuron and synapse costs are a primary concern - Performance/speed is critical - Context from the beginning isn't necessary - Handling high-volume customer support **Example Scenarios:** - Customer service chatbot - Quick Q&A applications - Status update bots - Transactional conversations ### Use `smart` When: - Having multi-step conversations - Initial context setup matters - Balance between cost and context is needed - Users reference earlier conversation points - Building educational or coaching tools **Example Scenarios:** - Code pair programming assistants - Technical troubleshooting - Educational tutoring - Complex customer issues - Creative writing assistance ### Use `full` When: - Thread is short (< 20 messages) - Every detail matters for accuracy - Cost is not a concern - Building research or analysis tools - Legal or compliance requirements **Example Scenarios:** - Document analysis - Legal consultation - Medical consultation - Research assistance - Critical decision support ## Performance Impact ### Neuron Usage ``` Scenario: 100-message thread, each message ~200 neurons recent mode: - Keeps last ~15 messages = ~3,000 neurons - Cost: Low smart mode: - Keeps first 3 + last 7 = ~2,000 neurons - Cost: Medium full mode: - Keeps all 100 messages = ~20,000 neurons - Cost: High ``` ### Response Time ``` recent mode: ~1-2 seconds smart mode: ~1.5-3 seconds full mode: ~3-10 seconds (depends on total length) ``` ## Best Practices ### Start Conservative ```javascript // Start with recent mode let contextMode = 'recent'; // Upgrade based on user needs if (userAsksAboutEarlierTopic) { contextMode = 'smart'; } if (criticalAccuracyRequired) { contextMode = 'full'; } ``` ### Monitor Thread Length ```python def get_optimal_mode(thread_id): thread_info = get_thread(thread_id) message_count = len(thread_info['messages']) total_tokens = thread_info['total_tokens'] if total_tokens < 4000: return 'full' # Small enough to keep everything elif message_count < 30: return 'smart' # Medium size, balanced approach else: return 'recent' # Large thread, optimize ``` ### Provide User Control ```javascript // Let users choose their preference const userSettings = { contextMode: 'smart', // User preference autoAdjust: true // Allow automatic adjustment }; function getUserContextMode(threadLength) { if (!userSettings.autoAdjust) { return userSettings.contextMode; } // Auto-adjust based on thread length if (threadLength > 100) return 'recent'; if (threadLength > 20) return 'smart'; return 'full'; } ``` ### Handle Edge Cases ```python def send_message_safe(thread_id, message, preferred_mode='smart'): try: response = send_with_context_mode(thread_id, message, preferred_mode) return response except NeuronLimitError: # Fallback to more aggressive mode if preferred_mode == 'full': return send_with_context_mode(thread_id, message, 'smart') elif preferred_mode == 'smart': return send_with_context_mode(thread_id, message, 'recent') else: raise # Already using most aggressive mode ``` ## Recommendations ### Default Strategy **For most applications, use this progression:** 1. **Start with `recent`** - Fast, cost-effective, works for 80% of use cases 2. **Upgrade to `smart`** - When users reference earlier context or conversations become complex 3. **Reserve `full`** - Only for critical scenarios where every detail matters ### Cost Optimization ```javascript // Cost-aware mode selection function selectMode(threadLength, budget) { if (budget === 'low') return 'recent'; if (budget === 'medium') return threadLength < 50 ? 'smart' : 'recent'; if (budget === 'high') return threadLength < 20 ? 'full' : 'smart'; return 'recent'; // Default to cheapest } ``` ### Quality Optimization ```javascript // Quality-aware mode selection function selectModeForQuality(conversationType) { const qualityMap = { 'customer_service': 'recent', 'technical_support': 'smart', 'code_assistance': 'smart', 'legal_consultation': 'full', 'medical_consultation': 'full', 'research': 'full' }; return qualityMap[conversationType] || 'recent'; } ``` ## Limitations - **Neuron Limits**: Even `full` mode respects API neuron capacity limits - **Message Count**: Threads with 1000+ messages may still be truncated - **Performance**: `full` mode can be significantly slower for long threads - **Cost**: `full` mode can be 5-10x more expensive for long threads ## Related Resources - [Threads](/docs/documentation/core-concepts/threads) - [Create Model Response](/docs/api-reference/responses/create) - [Best Practices](/docs/documentation/best-practices) - [Rate Limiting](/docs/documentation/rate-limiting)