Model Selection
Choosing the right model can significantly impact cost and quality.Task-Based Recommendations
| Task | Recommended Models | Reasoning |
|---|---|---|
| Simple Q&A | gpt-4.1-mini, gemini-2.0-flash | Fast, cheap, good enough |
| Complex reasoning | o3, claude-sonnet-4-20250514, deepseek-r1 | Better logic and planning |
| Coding | claude-sonnet-4-20250514, gpt-4o, deepseek-v3 | Optimized for code |
| Creative writing | claude-3-5-sonnet-20241022, gpt-4o | Better prose quality |
| Vision/Images | gpt-4o, claude-3-5-sonnet-20241022, gemini-2.0-flash | Native vision support |
| Long context | gemini-2.5-pro, claude-3-5-sonnet-20241022 | 1M+ token windows |
| Cost-sensitive | gpt-4.1-mini, gemini-2.0-flash, deepseek-v3 | Best value |
Cost Tiers
Cost Optimization
1. Use Smaller Models First
2. Set max_tokens
Always set a reasonablemax_tokens limit:
3. Optimize Prompts
4. Enable Caching
Take advantage of semantic caching:5. Batch Similar Requests
Performance Optimization
1. Use Streaming for UX
Streaming improves perceived performance:2. Choose Fast Models for Interactive Use
| Use Case | Recommended | Latency |
|---|---|---|
| Chat UI | gpt-4.1-mini, gemini-2.0-flash | ~200ms first token |
| Tab completion | claude-3-5-haiku-20241022 | ~150ms first token |
| Background processing | gpt-4o, claude-3-5-sonnet-20241022 | ~500ms first token |
3. Set Timeouts
Reliability
1. Implement Retries
2. Handle Errors Gracefully
3. Use Fallback Models
Security
1. Protect API Keys
2. Validate User Input
3. Set API Key Limits
Create separate API keys with spending limits for:- Development/testing
- Production
- Different applications
Monitoring
1. Track Usage
Check your dashboard regularly for:- Token usage by model
- Cost breakdown
- Cache hit rates
- Error rates
2. Log Important Metrics
3. Set Up Alerts
Configure low balance alerts in your dashboard to avoid service interruption.Checklist
Cost optimization
Cost optimization
- Using appropriate model for each task
- Setting max_tokens limits
- Prompts are concise
- Caching enabled where appropriate
- Batching similar requests
Performance
Performance
- Streaming for interactive UX
- Fast models for real-time use
- Timeouts configured
Reliability
Reliability
- Retry logic implemented
- Error handling in place
- Fallback models configured
Security
Security
- API keys in environment variables
- Input validation
- Separate keys for dev/prod
- Spending limits set