Speech Detection: How It Works
Local Speech Detection Filters Clips Before AI Processing
Before Bitcut sends any audio to the server for AI transcription, it first runs a quick speech detection pass right on your device. This local analysis determines which clips contain speech and which are purely visual (music, ambient sound, silence). Only clips with detected speech are sent for transcription.
Why It Runs Locally
On-device speech detection provides three key benefits:
- Speed — local analysis takes just a few seconds, even for multiple clips
- Privacy — audio from clips without speech never leaves your device
- Quota savings — clips without speech are skipped entirely, so they don't count toward your AI minutes quota
How It Works
Audio analysis
When you add clips using Smart Add with AI or Clips Enhancement, Bitcut analyzes the audio waveform of each clip on your device.
Speech vs. non-speech classification
Each clip is classified as containing speech or not. Clips are processed in parallel, so even a batch of 10 clips takes only a few seconds.
Routing
Clips with speech proceed to server-side AI transcription. Clips without speech are handled differently — they're processed with smart trimming based on visual content instead.
What You See
During speech detection, clips on the timeline show a magnifying glass icon with the "Analyzing" state. This phase is fast — usually 1-2 seconds per clip. Once analysis is complete, clips either move to the transcription phase or are immediately trimmed and marked as complete.