Performance / Technical

Processing model and scaling behavior.

Intervals are processed during the procedure. Final packaging and recording consolidation account for the remaining delay after surgery ends.

Single camera

30 sec – 4 min

15 min to 4 hr procedure

Dual camera

1.5 – 25 min

15 min to 4 hr procedure

Processing stages

Four-stage model.

Stage 1: Capture

Video and audio recorded locally and synchronized into declared 30-second intervals before downstream processing begins.

Stage 2: Interval segmentation

Each 30-second interval is independently bounded and processable. Failures are contained to the affected interval.

Stage 3: Structured extraction

ASR and VLM extract transcripts, visual events, and vitals into structured per-interval representations.

Stage 4: Synthesis

Timeline and report generated from extracted structures. Raw video is not replayed at the synthesis stage.

Scaling

Structured compression changes the scaling curve.

By the time synthesis runs, the system operates on extracted transcripts, visual events, vitals, and aligned metadata. Not raw capture streams. Dual-camera configurations increase per-interval data volume, reflected in the 6× ratio versus single camera.

Each 30-second interval is processed independently.
Synthesis reads structured JSON, not video files.
Deterministic schemas constrain output space and eliminate open-ended generation.

Validated processing times

Duration	1 Camera	2 Cameras
15 min	~30 sec	~1.5 min
30 min	~45 sec	~3 min
1 hr	~1 min	~6 min
2 hr	~2 min	~12 min
3 hr	~3 min	~18 min
4 hr	~4 min	~25 min

Processing model

Single and dual camera.

Processing occurs during surgery. Remaining delay consists of final packaging and recording consolidation.