Benchmarking LLMs for exoplanet transit detection
This benchmark measures whether models can spot planetary transits in Kepler lightcurves and separate planets from non_planet examples.
We balance planets and non_planet cases with a fixed seed and cap each target to three Kepler sectors to keep files manageable.
KOI and stellar tables come from the NASA Exoplanet Archive. Lightcurves are sigma-clipped (5σ) to drop outliers and can be de-trended with a simple linear regression design matrix to make transits clearer.
Planets are KOI candidates or confirmed targets that show enough detected transits; non_planet examples draw from false positives and eclipsing binaries. Stellar metadata is included when it exists.
Each prompt includes sampled time/flux pairs plus stellar context. Models reply withplanet, <period_days> ornon_planet.
Metrics are Accuracy (percent correct classifications), Total cost (estimated USD for all trials), and Speed (average latency per trial in seconds).