Trust, but Verify (TbV)
Cross-Modality Fusion for HD Map Change Detection
NeurIPS 2021

John Lambert

James Hays

[Paper]

[GitHub]

Abstract

High-definition (HD) map change detection is the task of determining when sensor data and map data are no longer in agreement with one another due to real-world changes. We collect the first dataset for the task, which we entitle the Trust, but Verify (TbV) dataset, by mining thousands of hours of data from over 9 months of autonomous vehicle fleet operations. We present learning-based formulations for solving the problem in the bird’s eye view and ego-view. Because real map changes are infrequent and vector maps are easy to synthetically manipulate, we lean on simulated data to train our model. Perhaps surprisingly, we show that such models can generalize to real world distributions. The dataset, consisting of maps and logs collected in six North American cities, is one of the largest AV datasets to date with more than 7.9 million images. We make the data available to the public, along with code and models under the the CC BY-NC-SA 4.0 license.

Dealing with real world changes and stale maps is a constant problem for large-scale self-driving efforts today. And this is the first public dataset for it. Here's an example of a stale map, after a bike lane was added:

HD maps have proven to be an effective way to assist self-driving vehicles in **safely** navigating difficult intersections and city streets. Unprotected lefts are not easy!

But we live in a constantly-changing world, so maps need to constantly be updated. Training and evaluating models in academia for this has been difficult without public datasets for the task.

We lean on using real sensor data with synthetically modified maps to train new models, and evaluate on over 200 logs with real world map changes. We mined large-scale fleet data for almost a year to collect interesting map changes. Here's an example from downtown Pittsburgh:

Each log is about 54 seconds long, and there are over 1000 of them in total, across train/val/test. This amounts to about 16 continuous hours of sensor data -- 8 Million images and 1 TB of data. An example from Stanford campus:

Every log comes with a paired HD map, with lane boundaries & markings in 3d, along with a ground height map, drivable area polygons, and annotated crosswalks. By number of driving hours, this dataset is about 3x the size of nuScenes (15.5 hours vs. 5.5 hours), and comes with annotated HD maps for all the logs. It's captured in 6 cities -- Austin, Detroit, Miami, Palo Alto, Pittsburgh, and Washington DC -- across 4 seasons.

Dataset Download

[Download link]

Code

We make our training code, model inference code, pretrained models, and rendering code publicly available on GitHub.

[GitHub]

Paper and Supplementary Material

John Lambert and James Hays.
Trust, but Verify: Cross-Modality Fusion for HD Map Change Detection.
In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021.
(hosted on ArXiv)

[Bibtex]

Talk

Invited talk at the CVPR 2021 VOCVALC Workshop:

NeurIPS 2021 SlidesLive: Talk
[Slides]

Acknowledgements

Thanks to Phillip Isola and Richard Zhang the page template.