Performance / Technical
Processing model and scaling behavior.
Intervals are processed during the procedure. Final packaging and recording consolidation account for the remaining delay after surgery ends.
Single camera
30 sec – 4 min
15 min to 4 hr procedure
Dual camera
1.5 – 25 min
15 min to 4 hr procedure
Processing stages
Four-stage model.
Stage 1: Capture
Video and audio recorded locally and synchronized into declared 30-second intervals before downstream processing begins.
Stage 2: Interval segmentation
Each 30-second interval is independently bounded and processable. Failures are contained to the affected interval.
Stage 3: Structured extraction
ASR and VLM extract transcripts, visual events, and vitals into structured per-interval representations.
Stage 4: Synthesis
Timeline and report generated from extracted structures. Raw video is not replayed at the synthesis stage.
Scaling
Structured compression changes the scaling curve.
By the time synthesis runs, the system operates on extracted transcripts, visual events, vitals, and aligned metadata. Not raw capture streams. Dual-camera configurations increase per-interval data volume, reflected in the 6× ratio versus single camera.
- Each 30-second interval is processed independently.
- Synthesis reads structured JSON, not video files.
- Deterministic schemas constrain output space and eliminate open-ended generation.
Validated processing times
| Duration | 1 Camera | 2 Cameras |
|---|---|---|
| 15 min | ~30 sec | ~1.5 min |
| 30 min | ~45 sec | ~3 min |
| 1 hr | ~1 min | ~6 min |
| 2 hr | ~2 min | ~12 min |
| 3 hr | ~3 min | ~18 min |
| 4 hr | ~4 min | ~25 min |
Processing model
Single and dual camera.
Processing occurs during surgery. Remaining delay consists of final packaging and recording consolidation.