Multi-Sensor Fusion 3D Detection Patent 2024 | RobotDocket

A 2024 grant on multi-task, multi-sensor fusion for 3D detection embodies the camp that says the answer is camera AND lidar AND radar, learned together.

Vision-only versus mapped gets the headlines, but the fusion camp has a quieter answer: use everything, and learn it together. US12051001B2, granted to UATC in July 2024 and naming AV researcher Raquel Urtasun, covers "Multi-task multi-sensor fusion for three-dimensional object detection."

Two words carry the design: multi-sensor and multi-task. Classified under G06N 3/084 (neural-network training), G01S 17/89 (lidar) and G06V 20/58 (object detection), the patent fuses camera, lidar and radar into one model that detects objects in 3D — and does several perception tasks at once, sharing computation across them rather than running a separate network per job.

“Provided are systems and methods that perform multi-task and/or multi-sensor fusion for three-dimensional object detection in furtherance of, for example, autonomous vehicle perception and control.”— U.S. Patent No. 12,051,001 source

The fusion philosophy answers the sensor wars by declining to enlist. Cameras are cheap and rich in semantics but weak on depth; lidar is precise on geometry but sparse and costly; radar sees through weather and measures velocity but is coarse. Fusing them lets each cover the others' weaknesses — the whole is more robust than any single modality the rival camps fight over.

The multi-task half is an efficiency argument. Running one shared network that does detection and related perception jobs together is cheaper than a zoo of specialized models — a real concern when the compute rides in a car. The patent is as much about fitting perception in a power budget as about accuracy.

The honest cost is complexity and calibration. Fusing three modalities means keeping three sensor types calibrated, time-synchronized and trained together — a heavier engineering burden than a vision-only stack. The fusion camp accepts that burden as the price of robustness; the vision camp rejects it as needless cost. Both are coherent positions.

For readers tired of the binary camera-versus-lidar framing, this patent is the reminder that a serious third camp exists and is winning a lot of real deployments. The answer many AV programs actually shipped was not 'pick the right sensor' but 'fuse them well' — and the IP behind that is exactly this kind of multi-task, multi-sensor model.

The Fusion Patent That Refuses to Pick a Sensor

Comments