CVPR 2026 Denver Colorado Conference Logo

AUTOPILOT

Autonomous Understanding Through Open-world Perception and Integrated Language Models for On-road Tasks

In conjunction with The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026

DateJune 3, 2026
VenueColorado Convention CenterHall 3A
LocationDenver, ColoradoUSA
Full-Day Workshop In-Person 3rd Edition

About AUTOPILOT

3rd Edition · CVPR 2026 Workshop

AUTOPILOT is a workshop on safety-critical autonomous driving, spotlighting robust perception and trajectory forecasting that support reliable decision-making and motion planning. It emphasizes the practical use of foundation models, vision-language and generative, through efficient distillation for on-vehicle deployment. A core theme is open-world learning, addressing Out-of-Distribution (OOD) and known hazards by detecting, predicting, and mitigating novel objects, agents, and events beyond standard taxonomies. AUTOPILOT features invited talks from leading industry experts, an open challenge, and archival proceedings, bringing academia and practitioners together to develop real-world solutions with explicit attention to societal impact, ethics, and reproducible evaluation.

6

Industry Speakers

2

Kaggle Challenges

Full

Day Workshop

CVPR

Proceedings

Invited Speakers

Leading researchers from industry and academia

Jose M. Alvarez

Jose M. Alvarez

Director of Research

NVIDIA

Manmohan Chandraker

Manmohan Chandraker

Professor, Department Head

UC San Diego / NEC Labs America

Bat El Shlomo

Bat El Shlomo

Senior Director, Perception and Foundational Models

ZOOX

Matthew Alun Brown

Matthew Alun Brown

Principal Scientist

Wayve

Workshop Schedule

June 3, 2026 · Colorado Convention Center, Hall 3A · Denver (MDT)

Morning Schedule

TimeSessionSpeaker
9:00 – 9:15Opening RemarksAli K. AlShami
9:15 – 10:00Keynote #1Jose M. Alvarez
10:00 – 11:00Coffee Break
11:00 – 12:00Keynote #2Manmohan Chandraker
12:00 – 13:45Lunch

Afternoon Schedule

TimeSessionSpeaker
13:45 – 14:15Oral Presentations I
14:15 – 14:45Keynote #3Matt Brown
14:45 – 15:00AUTOPILOT Competition & Winner PresentationRyan Rabinowitz
15:00 – 16:00Coffee Break
16:00 – 16:30Keynote #4Bat El Shlomo
16:30 – 17:00Oral Presentations II
17:00 – 17:10ACCIDENT Competition & Winner PresentationLukas Picek
17:10 – 17:20Zero Shot Competition & Winner PresentationJianwu Fang
17:20 – 17:30Award Presentation for Competition, Papers, and Poster
17:30Closing

Oral Presentations I

13:45 – 14:15

  • 13:45 – 13:52 · Paper #1

    ACCIDENT: A Benchmark Dataset for Zero Shot Accident Detection from Traffic Surveillance Videos

    Lukas Picek, Michal Cermak, Marek Hanzl, Vojtech Cermak

  • 13:52 – 14:00 · Paper #2

    TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding

    Muhammet Esat Kalfaoglu, Halil İbrahim Öztürk, Ozsel Kilinc, Alptekin Temizel

  • 14:00 – 14:07 · Paper #3

    Spatial-aware Vision Language Model for Autonomous Driving

    Weijie Wei, Zhipeng Luo, Feng Ling, Venice Erin Liong

  • 14:07 – 14:15 · Paper #4

    Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning

    Chun-Peng Chang, Chen-Yu Wang, Holger Caesar, Alain Pagani

Oral Presentations II

16:30 – 17:00

  • 16:30 – 16:37 · Paper #5

    Automingo: Seeing the Unseen — Vision-Language Edge Case Dataset for Detection and Analysis of Autonomous Driving

    Vaclav Divis, Íñigo Barceló Álvarez, Alejandro Fariñas Nubla, Enrique Sanchez, Ondřej Valach, Ivan Gruber, Antonio Hernandez-Ros Briales, Marek Hrúz

  • 16:37 – 16:45 · Paper #6

    Drive Like Humans, Plan Like Machines: An Explicit Sense and Safety Aware Autonomous Driving Framework

    Xia Wang, Ziyan An, Yuhang Zhang, Meiyi Ma, Daniel Work, Jonathan Sprinkle

  • 16:45 – 16:52 · Paper #7

    OmniSieve: Query-Guided Adaptive Token Allocation for Efficient Multi-View Vision-Language Reasoning

    Preetam Chhimpa, Indrajit Ghosh

  • 16:52 – 17:00 · Paper #8

    Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0

    Roni Goldshmidt, Hamish Scott, Lorenzo Niccolini

Call for Papers

We invite high-quality, original research submissions to the AUTOPILOT workshop at CVPR 2026

Archival Track

CVPR Proceedings

Full papers with novel contributions will be published in the official CVPR 2026 workshop proceedings. We expect high-quality submissions with significant technical contributions, rigorous evaluation, and clear presentation of results, with a strong focus on real-world autonomous driving and open-world deployment using VLLMs models.

Accepted Papers

7 Papers
  • ACCIDENT: A Benchmark Dataset for Zero Shot Accident Detection from Traffic Surveillance Videos
  • TopoMaskV3: 3D Mask Head with Dense Offset and Height Predictions for Road Topology Understanding
  • Spatial-aware Vision Language Model for Autonomous Driving
  • Probing the Reliability of Driving VLMs: From Inconsistent Responses to Grounded Temporal Reasoning
  • Automingo: Seeing the Unseen - Vision-Language Edge Case Dataset for Detection and Analysis of Autonomous Driving
  • Drive Like Humans, Plan Like Machines: An Explicit Sense and Safety Aware Autonomous Driving Framework
  • OmniSieve: Query-Guided Adaptive Token Allocation for Efficient Multi-View Vision-Language Reasoning

Non-Archival Track

Extended Abstracts

Extended abstracts and position papers for work-in-progress or preliminary findings. We also welcome papers rejected from the Archival Track can resubmit and papers related to the workshop that are already published in top peer-reviewed conferences and journals can submit for a poster spot to the Non-Archival Track.

Accepted Papers

21 Papers
  • Beyond the Beep: Scalable Collision Anticipation and Real-Time Explainability with BADAS-2.0
  • DashLens: Structured and Consistent Reasoning for Incident-Centric Dashcam Video Analysis
  • Zero-shot 3D General Obstacle Detection via Multimodal Foundation Models and Geometry
  • On the Feasibility and Opportunity of Autoregressive 3D Object Detection
  • DriveSafer: End-to-End Autonomous Driving with Safety Guidance
  • Making the Discrete Continuous: Synthetic RAW Augmentations for Fine-Grained Evaluation of Person Detection Performance in Low Light
  • Beyond Prompting: Structured Causal Chain Reasoning for VRU Accident Understanding in Vision Language Models
  • Real-World On-Vehicle Evaluation of Embedding-Based Anomaly Detection
  • Two-Pass Zero-Shot Temporal-Spatial Grounding of Rare Traffic Events in Surveillance Video
  • Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding
  • CASCADE-VLM: A Zero-Shot Cascade Pipeline for CCTV Accident Understanding
  • Safe2Drive: Evaluating Safe Driving Behaviors of E2E Autonomous Driving Models
  • STAR: Stage-wise Traffic Accident Detection via Optical-Flow-Guided Reasoning
  • Selective Optical-Flow Correction for Zero-Shot CCTV Accident Analysis with Vision-Language Models
  • Zero-Shot Traffic Accident Understanding via Two-Branch Min-Time Fusion with Multi-Scale Context Preservation
  • Geometry-Aware Road Damage Segmentation with Training-Time Depth Supervision
  • AUTOPILOT VQA: Benchmarking Vision-Language Models for Incident-Centric Dashcam Understanding
  • Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding
  • SynCrash: A Multi-Stage Pipeline for Zero-Shot Accident Detection and Localization in Traffic Surveillance Video
  • Zero-Shot Traffic Accident Detection via a Coarse-to-Fine VLM-Tracking Pipeline
  • Motion Recency Maps for Zero-Shot Traffic Accident Localisation in Fixed Surveillance Cameras

Topics of Interest

We invite submissions on a broad range of topics related to foundation models, multimodal perception, reasoning, and decision-making for autonomous systems, including but not limited to:

Foundation Models & Multimodal Reasoning

  • Foundation models for autonomous driving (VLMs, LLMs, generative and agentic models)
  • Vision-language models for perception, reasoning, grounding, and scene understanding
  • Embodied AI and multimodal reasoning for decision-making in autonomous vehicles

Open-World & Robust Autonomy

  • Open-world learning: open-set recognition, open-vocabulary learning, and OOD detection
  • Detection, prediction, and avoidance of out-of-label or novel hazards
  • Domain adaptation, transfer learning, and continual learning for robust autonomy

Prediction, Planning & Interaction

  • Multimodal motion forecasting, trajectory prediction, and behavior modeling
  • Activity recognition, pedestrian intention prediction, and human-agent interaction
  • Planning and decision-making under uncertainty

Multimodal Perception & Sensor Fusion

  • Multimodal sensor fusion (camera, LiDAR, radar, maps, depth) for scene understanding
  • Spatio-temporal representation learning for dynamic environments

Generative Models & Simulation

  • Generative models for simulation, data augmentation, forecasting, and scenario synthesis
  • Synthetic data generation and sim-to-real transfer

Systems, Deployment & Evaluation

  • Resource-efficient training, model compression, and edge deployment
  • Real-time inference and scalable autonomous systems
  • Novel datasets, benchmarks, evaluation protocols, and safety-centric metrics

Workshop Challenges

Join our Kaggle competitions focusing on safety-critical autonomous driving tasks

Launch: Feb 15, 2026
Winners: May 1, 2026

AUTOPILOT

Visual Question Answering

VQA

Advance accident understanding through detailed video-based visual question answering. Analyze vehicle trajectories, hazards, visibility, impact zones, and outcomes.

Winners

Team 1: Trilochan Team (Top-1)

Indian Institute of Information Technology Nagpur (IIITN)

Team 2: yuxiazff

  • 1 Zhejiang Supcon Info Company Ltd
  • 2 China Jiliang University
  • 3 Zhejiang University of Technology
  • 4 Anhui University
View on Kaggle

ACCIDENT

Zero-shot Detection

CCTV

Benchmark accident understanding in real CCTV footage. Tackle temporal localization, spatial localization, and collision type classification.

Winners

Team 1: GOOD DRIVE Team (Top-1)

GO Drive Inc

Team 2: GAILforce Team

Generative AI Lab

View on Kaggle

Zero-shot

Zero-shot Anticipation

ZsAA

Multi-modal accident risk anticipation using RGB frames, driver gaze, and text annotations across diverse road environments.

Winners

Team 1: CVLAB (Top-1)

Seoul National University, Seoul, Korea

SNU CVLab

Team 2: BUPT MIC Lab

Beijing University of Posts and Telecommunications

BUPT MIC Lab

View on Kaggle

All challenges emphasize perception, reasoning, and robustness in open-world scenarios. Winners will present their system analyses during the CVPR 2026 AUTOPILOT workshop. For a related workshop on out-of-label hazards, see 2COOOL @ ICCV 2025.

Organizing Committee

Meet the team behind AUTOPILOT 2026

Ali K. Alshami

Ali K. AlShami

NEC Labs America

Ryan Rabinowitz

Ryan Rabinowitz

Notre Dame

Maged Shoman

Maged Shoman

UT-ORII

Jianwu Fang

Jianwu Fang

Xi'an Jiaotong Univ.

Lukáš Picek

Lukáš Picek

INRIA / PiVa AI

Shao-yuan Lo

Shao-yuan Lo

National Taiwan Univ.

Steve Cruz

Steve Cruz

Notre Dame

Lei-lei Li

Lei-lei Li

Xi'an Jiaotong Univ.

Nachiket Kamod

Nachiket Kamod

BNSF | Tech

Jugal Kalita

Jugal Kalita

UCCS

Terrance Boult

Terrance E. Boult

UCCS