How it works.
OMEN is an ensemble of four engines on top of a real Census- sampled electorate. This page documents the actual algorithms and data. If a page number on /omen says X, the source is here.
01 · Population
Every forecast is built on a voter population, not a static turnout assumption. For each race with an ACS cache key, we pull US Census Bureau ACS 5-year PUMS microdata for the relevant PUMAs (Public Use Microdata Areas — each ~100k people). The raw cache is every record aged 18+ with citizen status 1-4 (naturalized or better). Records carry their own PWGTP sampling weight for probability-proportional draws.
Statewide caches are downsampled to 25k rows via Efraimidis- Spirakis weighted sampling without replacement, preserving total population weight. Distribution checks against the source microdata at the default simulation draw (n=5,000) show per-bucket deltas of roughly 0.2–1.2pp across age × education × race × income. Verify any cache yourself with tsx scripts/acs-summary.ts <key> 5000.
For each simulation trial we draw N voters (default 5,000) from the cache with PWGTP weights. Same seed, same draws, reproducible. Every voter carries their actual ACS demographics — age in years, education bucket, race/ ethnicity, income bucket, citizenship.
02 · Turnout
Each voter's turnout propensity comes from a joint lookup in CPS Voter Supplement November 2020 Table 1: 30 real-world turnout rates over (5 age buckets × 6 education buckets), from 24.4% (18–24, <HS) to 87.5% (65+, grad). Scaled by the race's election-cycle anchor (presidential 66%, midterm 54%, off-year 50%, primary 22%) and nudged by race/ sex/income adjusters. Primaries get an additional high- engagement shape multiplier.
Per-race turnout validation is in flight — the aggregate turnout-prediction error across the 9 historical fixtures hasn't been committed as a reproducible metric yet. Rerun tsx scripts/run-omen-pol-backtest.ts to regenerate fixture-level predicted-vs-actual turnout. Race-level salience (Kari Lake-style post-election-denier mobilization) is an acknowledged unmodeled residual.
03 · Voter ideology
Demographic partisan lean on a −1 (far left) to +1 (far right) axis is computed from education, race/ethnicity, age, sex, and income — each factor a small additive term anchored in post-2016 ANES-reported partisan correlations. Fixtures declare each candidate's ideology on the same axis (Bowman −0.65 / Latimer −0.15 in NY-16; Cruz +0.80 / O'Rourke −0.55 in TX 2018).
Soft Downsian proximity routing assigns each voter a preferred candidate with probability ∝ 1 / (|ideology − lean| + 0.35). Smoothing keeps moderates from flipping deterministically. Applied to partisan_base, demographic_ bloc, cross_pressured segments; swing + low_info stay noise- driven on purpose.
04 · Four engines
Each race runs through four independent forecasters:
- Fundamentals: pregame prior + 3pp incumbency bump. Confidence scales with margin.
- Polling aggregate: recency-decayed (half-life 14d), √n sample-size weighted, LV > RV > A population weighted. Pollster house effects subtracted per POLLSTER_BIASES (31 pollsters curated from the 538 Pollster Ratings archive + Silver Bulletin 2024 ratings; fallback neutral bias for unrecognized pollsters). Cross-party only.
- Synthetic electorate: the ACS-sampled population runs through the event stream with segment-specific belief-update rules. Shares at election day become the engine output.
- Event / sentiment: directional shift from recent events (trailing 14 days) applied on top of pregame prior. Capped at 0.6 confidence.
05 · Ensemble
Engine outputs combine via confidence-weighted average with race-type-specific base weights:
Each engine's effective weight is its base weight × its own reported confidence. The combined forecast then passes through 15% shrinkage toward uniform — keeping us honest until we've resolved enough live predictions to earn tighter confidence.
06 · Event scoring
Live-use events (news stories, debates, scandals, polls) are scored via Claude Haiku against a published rubric. Rubric defines valence (−1..+1) and magnitude (0..1) anchors so any scorer — human or LLM — produces consistent numbers. Regression-tested on 54 hand-authored fixture events: 85% type agreement, 87% subject agreement, mean |Δvalence| 0.31, mean |Δmagnitude| 0.14.
OMEN Event Scoring Rubric (v1)
Every political event is scored on two dimensions, plus a type:
TYPE (pick one):
- poll_release: a published poll showing a meaningful shift
- endorsement: a named individual or org publicly backing a candidate
- debate: a scheduled candidate-vs-candidate event
- scandal: a damaging story about a candidate
- news_event: everything else meaningful (speech, position change, major
policy announcement, viral moment, court ruling affecting the race)
- attack_ad: a significant ad buy targeting a candidate
- economic_data: an economic indicator release that shifts the mood
- election_day: the actual vote (only one per race, must be last)
VALENCE (-1 to +1): signed effect on the SUBJECT CANDIDATE named.
+1.0 maximally helpful to them (e.g., a unanimous positive endorsement
from a cross-cutting figure, a widely-praised debate performance)
+0.5 clearly positive (Obama endorses a Dem)
+0.2 mildly positive
0 neutral
-0.2 mildly negative
-0.5 clearly damaging (Kavanaugh-level controversy for an R)
-1.0 catastrophic (criminal indictment, withdrawal-level)
MAGNITUDE (0 to 1): how much coverage + attention this commanded.
0.1 niche story, traded in one news cycle
0.2 standard story (most events)
0.3 A-block on cable news for a day
0.5 top story for a week
0.7 re-writes the race narrative
1.0 election-day level (reserved for election_day)
Magnitudes default to EVENT_TYPE_PRIORS[type]. Override when the event
is clearly bigger or smaller than a typical instance of its type
(e.g., Access Hollywood tape is "scandal" with magnitude 0.7+, not 0.5).
SUBJECT CANDIDATE (required unless election_day):
The candidate the event is primarily about. Some events hit both sides
— pick the candidate who is the main subject (e.g., a debate is typically
scored about the weaker performer; a scandal is about the target).07 · Uncertainty
Every live pre-registration runs 100 Monte Carlo trials with per-trial synthetic seed + bootstrap-resampled polls. Output includes per-candidate p10/p50/p90 calibrated shares, P(wins), and margin distribution. Point estimates alone don't give you what you need to size a Kalshi position — these do.
Known limitation: we perturb sampling variance (seed + poll bootstrap), not model specification. True uncertainty includes "is our model right at all" — harder to quantify, requires model ensembling across different structural specs.
08 · The honest number
Every backtest is run with information frozen at T−30 days before election. We only use polls with fieldEnd ≤ cutoff and events with daysBeforeElection ≥ 30. That's the forecast OMEN would have made at the 72-hour pre- registration window — the number we're graded by.
Current honest T−30: 2.24pp mean share err, 7/9 winners correct across 9 fixtures.