
Captions vs. subtitles
Technically, captions transcribe the same language being spoken (great for accessibility and mute viewing), while subtitles translate into another language. This guide covers same-language captions; for translation see the video translation guide.
Pick a caption style
- Auto — natural rhythm; best for interviews and documentaries.
- Sentence — one full sentence per line; best for courses and films.
- Word by word — karaoke/TikTok style; best for Reels and Shorts.
- Custom — you choose the words per line.
Step by step
- Open the Caption tab and browse for your video.
- Set the spoken language and AI model (High is great for accuracy).
- Choose a split mode and adjust the style — font, size, color, outline, position.
- Press Start. VoxCaption transcribes the audio, builds a timed SRT, and renders a captioned video.
_captioned.mp4 with captions burned in and a _captions.srt you can reuse or edit later.Tips for high-converting captions
- For Shorts/Reels, use word-by-word with a bold outline and center position.
- Keep two lines max so captions never cover faces or key action.
- Set the language explicitly (especially Arabic) for the most accurate timing.
FAQ
How accurate is the speech-to-text?
It's powered by OpenAI Whisper, which handles accents and noise well. Use the High or Ultra model for the best results.
Can I edit captions before burning?
Yes — edit the generated SRT, then re-burn it from the Compiler with no re-transcription.
Caption your first video free
Try the AI Caption Creator in VoxCaption Studio — 14-day free trial.
Related Articles
Further reading: learn more about W3C guidance on captions & accessibility (external).