Model Selection Guide
New models drop constantly, so this guide focuses on what's working well with Caret right now. We'll keep it updated as the landscape shifts.
Current Top Models
Model | Context Window | Input Price* | Output Price* | Best For |
---|---|---|---|---|
Claude Sonnet 4 | 1M tokens | $3-6 | $15-22.50 | Reliable tool usage, complex codebases |
Qwen3 Coder | 256K tokens | $0.20 | $0.80 | Coding tasks, open source flexibility |
Gemini 2.5 Pro | 1M+ tokens | TBD | TBD | Large codebases, document analysis |
GPT-5 | 400K tokens | $1.25 | $10 | Latest OpenAI tech, three modes |
*Per million tokens
Budget Options
Model | Context Window | Input Price* | Output Price* | Notes |
---|---|---|---|---|
DeepSeek V3 | 128K tokens | $0.14 | $0.28 | Great value for daily coding |
DeepSeek R1 | 128K tokens | $0.55 | $2.19 | Budget reasoning champion |
Qwen3 32B | 128K tokens | Varies | Varies | Open source, multiple providers |
Z AI GLM 4.5 | 128K tokens | TBD | TBD | MIT licensed, hybrid reasoning |
*Per million tokens
Context Window Guide
Size | Word Count | Use Case |
---|---|---|
32K tokens | ~24,000 words | Single files, small projects |
128K tokens | ~96,000 words | Most coding projects |
200K tokens | ~150,000 words | Large codebases |
400K+ tokens | ~300,000+ words | Entire applications |
Performance note: Most models start dropping in quality around 400-500K tokens, even if they claim higher limits.
Open Source vs Closed Source
Open Source Advantages
- Multiple providers compete to host them
- Cheaper pricing due to competition
- Provider choice - switch if one goes down
- Faster innovation cycles
Open Source Models Available
- Qwen3 Coder (Apache 2.0)
- Z AI GLM 4.5 (MIT)
- Kimi K2 (Open source)
- DeepSeek series (Various licenses)
Quick Decision Matrix
If you want... | Use this |
---|---|
Something that just works | Claude Sonnet 4 |
To save money | DeepSeek V3 or Qwen3 variants |
Huge context windows | Gemini 2.5 Pro or Claude Sonnet 4 |
Open source | Qwen3 Coder, Z AI GLM 4.5, or Kimi K2 |
Latest tech | GPT-5 |
Speed | Qwen3 Coder on Cerebras (fastest available) |
What Others Are Using
Check OpenRouter's Caret usage stats to see real usage patterns from the community.
Context Management
Caret automatically handles context limits with auto-compact. When you approach your model's limit, Caret summarizes the conversation to keep working. You don't need to micromanage this.
The Bottom Line
Start with Claude Sonnet 4 if you want reliability. Experiment with open source options once you're comfortable to find the best fit for your workflow and budget.
The landscape moves fast - these recommendations reflect what's working now, but keep an eye on new releases.