Clip Visual Index
AI-Generated Frame Analysis: 4-Element Visual Descriptions
The Clip Visual Index is an AI-generated text description of what appears in each clip on your timeline. It identifies key subjects, actions, settings, and objects visible in the footage. These descriptions serve as input for AI Story Mode, which uses them to write narration that matches your visuals.
What the Visual Index Captures
For each clip, the AI analyzes representative frames and produces a description covering:
- Subjects — people, animals, or main objects in the frame
- Actions — what is happening (walking, cooking, talking, etc.)
- Setting — location and environment (beach, kitchen, city street)
- Notable details — text on screen, products, landmarks, weather
Generating the Visual Index
Automatic generation
When you activate AI Story Mode, Bitcut checks whether each clip already has a visual description. Any clips without one are analyzed automatically before the story script is generated.
Manual generation
You can also generate or regenerate the visual index for specific clips from the clip settings. This is useful if you've trimmed a clip and the old description no longer matches the visible content.
Editing Descriptions
AI-generated descriptions are a starting point. You can edit any clip's description manually to add context the AI might have missed — for example, the name of a person or a specific location. More accurate descriptions lead to better narration scripts from AI Story Mode.
Related Guides
- AI Story Mode — uses the visual index to generate narration
- Story Templates — guide the narration style