Pillar 01 · Ego/Exo4D Datasets

Recorded is not measured.

Almost all training video is footage that happened to exist. The POD is the opposite: a calibrated capture instrument where trained, fairly-paid operators perform real tasks under synchronized egocentric and exocentric rigs. Every camera is measured. Every hand is tracked. Every consent is signed. Physical AI gets built on that difference.

Synchronized ego + exoCalibrated multi-viewTrained operatorsSigned provenance

Every supply path fails quietly. We'll say it out loud.

The AI industry needs millions of hours of real-world human video, and the honest version of the sourcing story is uncomfortable:

Scraped data

Legally radioactive

Consent cannot be retrofitted. Bot-blocking and copyright litigation closed this path, and every model trained on scraped video carries the liability forward with it.

Synthetic data

Physics is the hard part

Renderers still fail at contact, friction, and deformation — exactly the events a manipulation policy has to learn. Synthetic data is a supplement, not a source.

Found footage

Volume without ground truth

An uncalibrated camera produces pixels, not measurements. Annotation after the fact is estimation — you cannot recover signal that was never captured.

Bespoke shoots

Real but unrepeatable

One-off collection produces beautiful demos that can't scale, can't be reproduced, and can't be audited. A dataset you can't re-run is a dataset you can't trust.

The POD: a room built like an instrument.

A purpose-built capture studio with controlled lighting and a measured geometry. Because the environment is fixed and calibrated, every hour captured inside it has properties in-the-wild footage structurally cannot have.

Synchronization

Egocentric (head-mounted) and exocentric (room-mounted) rigs record the same act on a shared clock. Cross-view correspondence is a property of the capture — not a guess made later.

Calibration

Known camera intrinsics and extrinsics make reconstruction metric: positions in meters, camera trajectories you can measure against, depth that is real rather than inferred.

Repeatability

Same room, same lighting, same rig. Tasks can be re-run, datasets re-validated, and any error traced back to a specific session — the same standard a lab applies to its own experiments.

Operators

Trained, fairly-paid local performers execute structured task protocols — cooking, cleaning, manipulation, navigation — consistently enough to be a controlled variable, not a source of noise.

Provenance

Consent and licensing are captured at the door, cryptographically signed, and travel with every asset derived from the session. Revocation propagates — even to datasets already delivered.

Output, not promises.

Both frames below are unretouched pipeline output from real capture sessions — the same artifacts a buyer receives.

Metric 3D scene reconstruction showing a point cloud of the environment with the measured camera frustum and hand wrist trajectory overlaid, alongside the source egocentric manipulation frames — Metric scene reconstruction — point cloud, measured camera frustum, and wrist trajectory from a single session. The data robotics labs build world models from.

Grid of synchronized egocentric video frames with skeletal hand-pose and per-frame object-detection overlays during a real manipulation task — Per-frame skeletal hand pose and object detection across synchronized egocentric views — ground truth for manipulation policies.

Raw footage in. Certified, lab-ready datasets out.

01 · Ingest

Standardize

Raw video in. The untouched original is preserved; a clean, standardized working copy moves down the line.

02 · Enrich

Extract signal

Speech, captions, hand & body pose, depth, camera trajectory, object detection, 3D scene reconstruction — extracted automatically, 60+ metadata fields per segment.

03 · Certify

Birth certificate

A cryptographically signed certificate: who consented, under what license, when — plus a tamper-proof fingerprint of the footage itself.

04 · QA

Scored against your spec

Every dataset is scored against the buyer's spec before it ships. Misses are flagged by us — not discovered by you.

05 · Export

Buyer format

WebDataset for frontier labs. RLDS — the standard robot-training format — for robotics labs.

Not industry standard. On purpose.

Measurement at capture time, not annotation after the fact.

The industry films first and labels later. We instrument the room so pose, depth, and trajectory are measured properties of the session. Post-hoc annotation estimates; an instrument records.

Simultaneous ego + exo of the same act, commercially licensable.

Research proved the paired ego/exo format is what embodied AI needs — then licensed it for academia only. We manufacture it as a commercial product, with the legal chain to sell it.

Consent that can be revoked — and revocation that propagates.

A performer can withdraw, and the withdrawal reaches datasets we've already delivered. It costs us margin. We do it anyway, because provenance you can't revoke is provenance you can't trust.

Throughput we publish, not promise.

A 4-hour session becomes a certified, annotated dataset in 10–15 minutes. Those numbers are pipeline telemetry, not a sales projection.

60+

Metadata fields per segment

150k+

Hours in pipeline

10–15 min

To process a 4-hour session

100%

Consented & certified

Tell us the spec. We'll capture it.

Robotics and frontier AI teams from pre-seed to Series B use our datasets. Every asset ships with provable consent, licensing, and provenance — built for an era of AI copyright litigation and EU AI Act transparency. And if found footage is genuinely good enough for your problem, we'll tell you that too.

Book a call