TransitBench

Benchmarking LLMs for exoplanet transit detection

15/12/2025
How this benchmark works

This benchmark measures whether models can spot planetary transits in Kepler lightcurves and separate planets from non_planet examples.

We balance planets and non_planet cases with a fixed seed and cap each target to three Kepler sectors to keep files manageable.

KOI and stellar tables come from the NASA Exoplanet Archive. Lightcurves are sigma-clipped (5σ) to drop outliers and can be de-trended with a simple linear regression design matrix to make transits clearer.

Planets are KOI candidates or confirmed targets that show enough detected transits; non_planet examples draw from false positives and eclipsing binaries. Stellar metadata is included when it exists.

Each prompt includes sampled time/flux pairs plus stellar context. Models reply withplanet, <period_days> ornon_planet.

Metrics are Accuracy (percent correct classifications), Total cost (estimated USD for all trials), and Speed (average latency per trial in seconds).

Success rate by model
Percentage of correct classifications across all trials per model