Benchmarks LongTail

LongTail E2E Driving

End-to-end driving on 1,000 rare scenarios. Ranked by Multi-Maneuver Score (MMS, 0–10) — a metric significantly more correlated with closed-loop DrivingScore than standard L2 error. Best submission per method shown.

2026 Few-Shot Challenge Winners

Selected challenge winners, paired with LongTail driving clips.

View submissions ↗
🥇 Rank 1

Winner video

STLA-MINES

Caio Azevedo, Stefano Sabatini, Sascha Hornauer, Fabien Moutarde

Stellantis and Mines Paris

MMS

5.15

Semantic coherence

N/A

No video available

🥈 Rank 2

Runner-up video

Rangers

Sanath Tiptur Sadashivaiah, Taehyoung, Abhishek

TH Aschaffenburg, Fraunhofer IVI, TH Ingolstadt

MMS

4.31

Semantic coherence

0.84

🥉 Rank 3

Third-place video

KE:SAI

Yijie Wang, Kashyap Chitta

University of Toronto, KE:SAI

MMS

4.31

Semantic coherence

0.39

Leaderboard

Live · fetched from HuggingFace · best submission per method

View full leaderboard ↗
Loading leaderboard…

Metrics

MMS

↑ higher is better

Multi-Maneuver Score (0–10). Composite score covering trajectory accuracy and semantic compliance across scenario types. Defined in the KITScenes LongTail paper. arXiv:2603.23607 ↗

Semantic Coherence

↑ higher is better

Semantic Coherence measures whether the driving actions described in a model's reasoning trace match its planned trajectory. It is computed via an embedding-based Roccihio classification. Defined in the KITScenes LongTail paper. arXiv:2603.23607 ↗

KIT FZI TU Delft UC3M UPM University of Toronto