Diffraction Diffraction
← All posts
Engineering

Raw footage in, certified datasets out: inside the video-processor

A warehouse of cameras without software is just a pile of video files. The video-processor is the factory line that makes our footage a product — built and verified end-to-end on real creator footage and instrumented capture sessions. It processes a 4-hour VOD into 40–50 annotated clips in 10–15 minutes.

The five steps

  1. Ingest — standardize. Take raw video, preserve the untouched original, make a clean working copy.
  2. Enrich — extract signal. Speech, captions, hand & body pose, depth, camera trajectory, object detection, 3D scene reconstruction — 60+ metadata fields per segment, extracted automatically.
  3. Certify — the birth certificate. Every asset ships with a cryptographically signed certificate proving who consented, under what license, and when, plus a tamper-proof fingerprint of the footage. It's revocation-aware: a creator can pull their data and the revocation propagates to already-delivered datasets.
  4. QA — lab-readiness. The dataset is scored against the buyer's spec, and anything missing is flagged before it ships.
  5. Export — buyer format. WebDataset for frontier labs; RLDS — the standard robot-training format — for robotics labs.

Why certification matters

Everyone else sells found data. We sell manufactured, certified data. In an era of AI copyright litigation and EU AI Act transparency requirements, provable consent and provenance per asset is what lets buyers sleep at night — and it's something scrapers structurally cannot retrofit onto data they took.

One pipeline serves everything we ship: creator video gets the certified, licensed, curated track, and instrumented pod capture gets the heavy, fully-instrumented version. Same factory line.

Humans are at the center of everything we do.
Diffraction, Inc. · Columbus, Ohio