Dictation
Note
캐럿(Caret) 기준 문서입니다. Caret v3.38.1 머지본을 따르며, 캐럿 전용 정책(지원 OS/마이크 권한, 인증/라우팅, 음성 처리 제한)이 있을 경우 본문에서 <Note>로 표시합니다.
Note
음성 기능 이식: merging/v3.38.1/attempt-2-master.md에 기록된 대로 일부 VoiceRecorder/PulsingBorder 스타일은 미이식 정책이 있습니다. 캐럿에서 지원하지 않는 스타일/동작을 문서에 명시하세요.
Dictation transforms how you work with AI. Instead of typing out complex thoughts, you speak naturally and share your complete intent. This isn't just about speed - though voice is faster - it's about enabling fluid collaboration that typing can't match.
Why Voice Changes Everything
When you type, you edit yourself. You simplify complex ideas, skip context, and lose nuance. When you speak, you share everything on your mind - the full problem, the constraints, the edge cases you're worried about.
Use Dictation constantly in Plan mode for rapid back-and-forth discussions. Instead of typing careful, structured prompts, think about a problem. Caret asks clarifying questions, respond immediately, and iterate until having a solid plan.
The friction of typing was holding back real collaboration. Voice removes that friction.
Getting Started
Enable Dictation:
- Go to Settings → Features → Dictation
- Toggle "Enable Dictation" on
- Sign into your Caret account when prompted
- Install FFmpeg if you haven't already (Caret will guide you)
Once enabled, you'll see a microphone button in the chat input area.
Using Dictation:
- Click the microphone button to start recording
- Speak naturally
- Click again to stop recording
- Wait for transcription to appear in the chat
Tip
Dictation works with any AI model you've configured. The transcription happens through Caret's service, but your conversation continues with whatever model you're using.
System Requirements
Note
Dictation is currently not available on Windows. Support for Windows is planned for a future release.
Dictation uses FFmpeg to capture your voice across all platforms:
- macOS: FFmpeg (via Homebrew:
brew install ffmpeg) - Linux: FFmpeg (via apt:
sudo apt-get install ffmpeg)
If you don't have FFmpeg installed, Caret will automatically detect this and prompt you to install it with a single click.
Where Dictation Shines
Plan Mode Conversations
Dictation is perfect for Plan mode discussions. Instead of carefully crafting prompts, you can:
- Dictate your entire problem context in one go
- Respond to Caret's questions immediately
- Iterate on ideas without typing friction
- Think out loud while Caret listens
Start a planning session by speaking for 2-3 minutes straight, explaining the full context of what you're trying to build, the constraints you're working with, and the specific challenges you're facing.
Complex Problem Explanation
Some problems are hard to type out. When you're dealing with:
- Multi-step workflows with edge cases
- Integration challenges across multiple systems
- Performance issues with specific reproduction steps
- UI/UX problems that need detailed context
Speaking lets you explain the full situation naturally, including all the "oh, and also..." details that matter.
Code Review and Debugging
When reviewing code or explaining bugs, voice lets you walk through your thought process:
- "This function looks fine, but I'm worried about what happens when..."
- "The issue might be in this section, or possibly this other area..."
- "I tried X and Y, but neither worked because..."
You can share your complete debugging journey instead of just the final question.
Technical Requirements
System Requirements:
- FFmpeg installed on your system
- Active internet connection
- Caret account with transcription credits
Audio Quality:
- Records in WebM format with Opus codec
- Mono audio at 16kHz sample rate
- Optimized for voice recognition
Privacy:
- Audio recorded locally on your machine
- Only audio files sent for transcription
- No audio stored after transcription
- Temporary files automatically cleaned up
Cost and Credits
Voice transcription costs $0.006 per minute through your Caret account. For most users, this works out to pennies per session.
A typical 5-minute planning conversation costs about 3 cents. Even heavy voice users rarely spend more than a few dollars per month.
Note
Pricing is experimental and may change as we refine the service.
Best Practices
Speak Naturally Don't try to speak like you type. Use your normal conversational tone and don't worry about perfect grammar.
Give Context First Start with the big picture, then drill down into specifics. "I'm building a React app that needs to handle real-time data, and I'm running into performance issues with the WebSocket connection..."
Use Voice for Exploration Dictation is perfect for exploratory conversations where you're not sure exactly what you need. Start talking through the problem and let the conversation evolve.
Combine with Text You don't have to use voice for everything. Use voice for complex explanations and context, then switch to text for quick follow-ups or code snippets.
Troubleshooting
Microphone Not Working
- Check your IDE permissions for microphone access
- Ensure FFmpeg is properly installed
- Try refreshing VSCode/your editor
Poor Transcription Quality
- Speak clearly and at normal volume
- Reduce background noise if possible
- Check your microphone settings
Connection Issues
- Verify internet connection
- Check if firewall is blocking Caret's servers
- Try signing out and back into your Caret account
Authentication Issues
- Sign out and back into your Caret account if you see authentication errors
- Check that your account has sufficient transcription credits
- Verify your internet connection is stable
Audio Recording Issues
- Ensure FFmpeg is properly installed and accessible
- Check that your browser/IDE has microphone permissions
- Try restarting your editor if audio capture fails
The Future of AI Collaboration
When you can speak your thoughts as fast as you think them, you stop self-editing. You share the full context, the edge cases, the "what if" scenarios that matter. This leads to better solutions and fewer back-and-forth clarifications.