AUTOPILOT

Name: AUTOPILOT Workshop @ CVPR 2026
Start: 2026-06-03
End: 2026-06-04
Location: Colorado Convention Center

Autonomous Understanding Through Open-world Perception and Integrated Language Models for On-road Tasks

In conjunction with The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

DateJune 3, 2026

VenueColorado Convention CenterHall 3A

LocationDenver, ColoradoUSA

Full-Day Workshop In-Person 3rd Edition

About AUTOPILOT

3rd Edition · CVPR 2026 Workshop

AUTOPILOT is a workshop on safety-critical autonomous driving, spotlighting robust perception and trajectory forecasting that support reliable decision-making and motion planning. It emphasizes the practical use of foundation models, vision-language and generative, through efficient distillation for on-vehicle deployment. A core theme is open-world learning, addressing Out-of-Distribution (OOD) and known hazards by detecting, predicting, and mitigating novel objects, agents, and events beyond standard taxonomies. AUTOPILOT features invited talks from leading industry experts, an open challenge, and archival proceedings, bringing academia and practitioners together to develop real-world solutions with explicit attention to societal impact, ethics, and reproducible evaluation.

Industry Speakers

Kaggle Challenges

Full

Day Workshop

CVPR

Proceedings

Past workshops

2COOOLOut-of-label hazards in autonomous driving · ICCV 2025

Autonomous vehicle with radar perception visualization for AUTOPILOT Workshop

Invited Speakers

Leading researchers from industry and academia

Jose M. Alvarez

Director of Research

NVIDIA

Manmohan Chandraker

Professor, Department Head

UC San Diego / NEC Labs America

Bat El Shlomo

Senior Director, Perception and Foundational Models

ZOOX

Matthew Alun Brown

Principal Scientist

Wayve

Workshop Schedule

June 3, 2026 · Colorado Convention Center, Hall 3A · Denver (MDT)

Morning Schedule

Time	Session	Speaker	Affiliation
9:00 – 9:15	Opening Remarks	Ali K. AlShami	NEC Labs America
9:15 – 10:00	Keynote #1	Jose M. Alvarez	NVIDIA
10:00 – 11:00	Coffee Break
11:00 – 12:00	Keynote #2	Manmohan Chandraker	UC San Diego & NEC Labs America
12:00 – 13:45	Lunch

Afternoon Schedule

Time	Session	Speaker	Affiliation
13:45 – 14:15	Oral Presentations I	—	—
14:15 – 14:45	Keynote #3	Matt Brown	Wayve
14:45 – 15:00	AUTOPILOT Competition & Winner Presentation	Ryan Rabinowitz	—
15:00 – 16:00	Coffee Break
16:00 – 16:30	Keynote #4	Bat El Shlomo	Zoox
16:30 – 17:00	Oral Presentations II	—	—
17:00 – 17:10	ACCIDENT Competition & Winner Presentation	Lukas Picek	—
17:10 – 17:20	Zero Shot Competition & Winner Presentation	Jianwu Fang	—
17:20 – 17:30	Award Presentation for Competition, Papers, and Poster	—	—
17:30	Closing	—	—

Oral Presentations I

13:45 – 14:15

13:45 – 13:52 · Paper #1
ACCIDENT: A Benchmark Dataset for Zero Shot Accident Detection from Traffic Surveillance Videos
Lukas Picek, Michal Cermak, Marek Hanzl, Vojtech Cermak
13:52 – 14:00 · Paper #2
TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding
Muhammet Esat Kalfaoglu, Halil İbrahim Öztürk, Ozsel Kilinc, Alptekin Temizel
14:00 – 14:07 · Paper #3
Spatial-aware Vision Language Model for Autonomous Driving
Weijie Wei, Zhipeng Luo, Feng Ling, Venice Erin Liong
14:07 – 14:15 · Paper #4
Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning
Chun-Peng Chang, Chen-Yu Wang, Holger Caesar, Alain Pagani

Oral Presentations II

16:30 – 17:00

16:30 – 16:37 · Paper #5
Automingo: Seeing the Unseen — Vision-Language Edge Case Dataset for Detection and Analysis of Autonomous Driving
Vaclav Divis, Íñigo Barceló Álvarez, Alejandro Fariñas Nubla, Enrique Sanchez, Ondřej Valach, Ivan Gruber, Antonio Hernandez-Ros Briales, Marek Hrúz
16:37 – 16:45 · Paper #6
Drive Like Humans, Plan Like Machines: An Explicit Sense and Safety Aware Autonomous Driving Framework
Xia Wang, Ziyan An, Yuhang Zhang, Meiyi Ma, Daniel Work, Jonathan Sprinkle
16:45 – 16:52 · Paper #7
OmniSieve: Query-Guided Adaptive Token Allocation for Efficient Multi-View Vision-Language Reasoning
Preetam Chhimpa, Indrajit Ghosh
16:52 – 17:00 · Paper #8
Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
Roni Goldshmidt, Hamish Scott, Lorenzo Niccolini

Call for Papers

We invite high-quality, original research submissions to the AUTOPILOT workshop at CVPR 2026

Archival Track

CVPR Proceedings

Full papers with novel contributions will be published in the official CVPR 2026 workshop proceedings. We expect high-quality submissions with significant technical contributions, rigorous evaluation, and clear presentation of results, with a strong focus on real-world autonomous driving and open-world deployment using VLLMs models.

Accepted Papers

7 Papers

ACCIDENT: A Benchmark Dataset for Zero Shot Accident Detection from Traffic Surveillance Videos
TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding
Spatial-aware Vision Language Model for Autonomous Driving
Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning
Automingo: Seeing the Unseen - Vision-Language Edge Case Dataset for Detection and Analysis of Autonomous Driving
Drive Like Humans, Plan Like Machines: An Explicit Sense and Safety Aware Autonomous Driving Framework
OmniSieve: Query-Guided Adaptive Token Allocation for Efficient Multi-View Vision-Language Reasoning

Non-Archival Track

Extended Abstracts

Extended abstracts and position papers for work-in-progress or preliminary findings. We also welcome papers rejected from the Archival Track can resubmit and papers related to the workshop that are already published in top peer-reviewed conferences and journals can submit for a poster spot to the Non-Archival Track.

Accepted Papers

21 Papers

Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
DashLens: Structured and Consistent Reasoning for Incident-Centric Dashcam Video Analysis
Zero-shot 3D General Obstacle Detection via Multimodal Foundation Models and Geometry
On the Feasibility and Opportunity of Autoregressive 3D Object Detection
DriveSafer: End-to-End Autonomous Driving with Safety Guidance
Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light
Beyond Prompting: Structured Causal Chain Reasoning for VRU Accident Understanding in Vision Language Models
Real-World On-Vehicle Evaluation of Embedding-Based Anomaly Detection
Two-Pass Zero-Shot Temporal-Spatial Grounding of Rare Traffic Events in Surveillance Video
Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
CASCADE-VLM: A Zero-Shot Cascade Pipeline for CCTV Accident Understanding
Safe2Drive: Evaluating Safe Driving Behaviors of E2E Autonomous Driving Models
STAR: Stage-wise Traffic Accident Detection via Optical-Flow-Guided Reasoning
Selective Optical-Flow Correction for Zero-Shot CCTV Accident Analysis with Vision-Language Models
Zero-Shot Traffic Accident Understanding via Two-Branch Min-Time Fusion with Multi-Scale Context Preservation
Geometry-Aware Road Damage Segmentation with Training-Time Depth Supervision
AUTOPILOT VQA: Benchmarking Vision-Language Models for Incident-Centric Dashcam Understanding
Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding
SynCrash: A Multi-Stage Pipeline for Zero-Shot Accident Detection and Localization in Traffic Surveillance Video
Zero-Shot Traffic Accident Detection via a Coarse-to-Fine VLM-Tracking Pipeline
Motion Recency Maps for Zero-Shot Traffic Accident Localisation in Fixed Surveillance Cameras

Topics of Interest

We invite submissions on a broad range of topics related to foundation models, multimodal perception, reasoning, and decision-making for autonomous systems, including but not limited to:

Foundation Models & Multimodal Reasoning

•Foundation models for autonomous driving (VLMs, LLMs, generative and agentic models)
•Vision-language models for perception, reasoning, grounding, and scene understanding
•Embodied AI and multimodal reasoning for decision-making in autonomous vehicles

Open-World & Robust Autonomy

•Open-world learning: open-set recognition, open-vocabulary learning, and OOD detection
•Detection, prediction, and avoidance of out-of-label or novel hazards
•Domain adaptation, transfer learning, and continual learning for robust autonomy

Prediction, Planning & Interaction

•Multimodal motion forecasting, trajectory prediction, and behavior modeling
•Activity recognition, pedestrian intention prediction, and human-agent interaction
•Planning and decision-making under uncertainty

Multimodal Perception & Sensor Fusion

•Multimodal sensor fusion (camera, LiDAR, radar, maps, depth) for scene understanding
•Spatio-temporal representation learning for dynamic environments

Generative Models & Simulation

•Generative models for simulation, data augmentation, forecasting, and scenario synthesis
•Synthetic data generation and sim-to-real transfer

Systems, Deployment & Evaluation

•Resource-efficient training, model compression, and edge deployment
•Real-time inference and scalable autonomous systems
•Novel datasets, benchmarks, evaluation protocols, and safety-centric metrics

Workshop Challenges

Join our Kaggle competitions focusing on safety-critical autonomous driving tasks

Launch: Feb 15, 2026

Winners: May 1, 2026

AUTOPILOT

Visual Question Answering

VQA

Advance accident understanding through detailed video-based visual question answering. Analyze vehicle trajectories, hazards, visibility, impact zones, and outcomes.

Winners

Team 1: Trilochan Team (Top-1)

Indian Institute of Information Technology Nagpur (IIITN)

Team 2: yuxiazff

¹ Zhejiang Supcon Info Company Ltd
² China Jiliang University
³ Zhejiang University of Technology
⁴ Anhui University

View on Kaggle

ACCIDENT

Zero-shot Detection

CCTV

Benchmark accident understanding in real CCTV footage. Tackle temporal localization, spatial localization, and collision type classification.

Winners

Team 1: GOOD DRIVE Team (Top-1)

GO Drive Inc

Team 2: GAILforce Team

Generative AI Lab

View on Kaggle

Zero-shot

Zero-shot Anticipation

ZsAA

Multi-modal accident risk anticipation using RGB frames, driver gaze, and text annotations across diverse road environments.

Winners

Team 1: CVLAB (Top-1)

Seoul National University, Seoul, Korea

SNU CVLab

Team 2: BUPT MIC Lab

Beijing University of Posts and Telecommunications

BUPT MIC Lab

View on Kaggle

All challenges emphasize perception, reasoning, and robustness in open-world scenarios. Winners will present their system analyses during the CVPR 2026 AUTOPILOT workshop. For a related workshop on out-of-label hazards, see 2COOOL @ ICCV 2025.