A Jeopardy champion and data scientist walked into a bar chart. This is what happened next.
Board Control is built by Colin Davy, a data scientist in Chicago and Jeopardy champion. You can read the full story of how he used data science to prepare for and win on Jeopardy in this article. Find him on Bluesky at @adjbaseline.
Every day, this site automatically analyzes the latest Jeopardy game and generates win probability charts showing each player's chances of winning at every moment — like the win probability graphs ESPN shows during football games, but for Jeopardy. It covers every regular season and tournament game. Over 5,000 games and counting.
Beyond win probability, the site includes:
Board Control started with a simple question: "Was my Daily Double wager correct?" Answering that required a win probability model. Building the model opened up bigger questions — who's really the best player of all time, which games were the most dramatic, which wagers were smart and which were memorable blow-ups. One tool turned into the site you see now: a place for anyone who wants to understand the show the way serious fans understand any competitive format.
For short-form answers to common questions, see the FAQ. For questions or feedback, get in touch.
The Excitement Index is a 0–10 score built from ten measurable game-content signals — Round Tempo, Final Stakes, DD Wagering, FJ Cover Tightness, Hot Start, Buzzer Dominance, Stakes Context, Comeback Depth, FJ Swing, and Run-of-Correct. The FAQ has the short version. This section is for readers who want the underlying methodology.
The default slider weights aren't picked by feel. They're fit against actual r/Jeopardy community reaction. We pulled every available Reddit thread for each modern-era episode (~2,400 games with substantive discussion) and asked Claude Sonnet 4.6 to score each thread + its top 50 comments on a strict 1–10 rubric:
Crucially, the rubric explicitly debiases against star power. A close-fought game between unknowns can score 9 or 10; a blowout featuring a famous player can score 3 or 4. The signal we're after is sentiment density and content, not comment volume or name recognition.
The 10-component formula wasn't designed up front. It was produced by twelve iterations of a structured loop:
Most candidates lost. The ones that survived twelve rounds of this gate are what's in v14. Architecturally the formula stayed simple throughout: a weighted mean of normalized primitives, one monotone transform per primitive, no multipliers, no conditional logic.
On held-out games — episodes the optimizer never saw during fitting — Spearman ρ between the formula's score and the human-graded community sentiment is 0.61. That means roughly 37% of the variance in how Reddit actually rates a Jeopardy game is captured by these ten game-content primitives alone, with no information about who the players are or how famous they became.
A useful sanity check on whether this is data-driven or vibes-driven: several of the primitives in v14 had been rejected in earlier iterations under noisier calibration targets (an older version used a keyword-counting heuristic on Reddit comments instead of the LLM rubric). Once the target got cleaner, the same primitives passed the gate cleanly. Comeback Depth, Hot Start, Buzzer Dominance, FJ Cover Tightness, and Run-of-Correct all fall into this category — features whose signal was real but had been buried by noise in the older target.
The reverse also happened: two primitives the previous version (v9) leaned on, FJ Suspense and DD Correct Aggression, dropped out under the cleaner target. They were redundant with other features — FJ Cover Tightness measures the same thing as FJ Suspense more directly, and DD Correct Aggression correlates +0.50 with Final Stakes on raw values. Both were removed, weights re-fit without them, and the holdout score improved.
The honest limitations. The formula sees gameplay numbers — scores, wagers, buzzer wins, leads, deficits. It does not see question quality, contestant chemistry, banter, host moments, or anything else that makes an episode feel alive on the broadcast. A game can be objectively close and tactically interesting and still feel flat on TV; another can have a "vibe" the numbers will never explain. The 0.61 holdout correlation is a real number — about 37% of community-sentiment variance — but the remaining 63% is genuinely outside what gameplay statistics can reach.