rafhq · omen · methodology

How it works.

OMEN is an ensemble of four engines on top of a real Census- sampled electorate. This page documents the actual algorithms and data. If a page number on /omen says X, the source is here.

01 · Population

Every forecast is built on a voter population, not a static turnout assumption. For each race with an ACS cache key, we pull US Census Bureau ACS 5-year PUMS microdata for the relevant PUMAs (Public Use Microdata Areas — each ~100k people). The raw cache is every record aged 18+ with citizen status 1-4 (naturalized or better). Records carry their own PWGTP sampling weight for probability-proportional draws.

Statewide caches are downsampled to 25k rows via Efraimidis- Spirakis weighted sampling without replacement, preserving total population weight. Distribution checks against the source microdata at the default simulation draw (n=5,000) show per-bucket deltas of roughly 0.2–1.2pp across age × education × race × income. Verify any cache yourself with tsx scripts/acs-summary.ts <key> 5000.

For each simulation trial we draw N voters (default 5,000) from the cache with PWGTP weights. Same seed, same draws, reproducible. Every voter carries their actual ACS demographics — age in years, education bucket, race/ ethnicity, income bucket, citizenship.

source: src/lib/omen/demographics/acs.ts · sampleFromPUMS(entry, n, rng)

02 · Turnout

Each voter's turnout propensity comes from a joint lookup in CPS Voter Supplement November 2020 Table 1: 30 real-world turnout rates over (5 age buckets × 6 education buckets), from 24.4% (18–24, <HS) to 87.5% (65+, grad). Scaled by the race's election-cycle anchor (presidential 66%, midterm 54%, off-year 50%, primary 22%) and nudged by race/ sex/income adjusters. Primaries get an additional high- engagement shape multiplier.

Per-race turnout validation is in flight — the aggregate turnout-prediction error across the 9 historical fixtures hasn't been committed as a reproducible metric yet. Rerun tsx scripts/run-omen-pol-backtest.ts to regenerate fixture-level predicted-vs-actual turnout. Race-level salience (Kari Lake-style post-election-denier mobilization) is an acknowledged unmodeled residual.

source: src/lib/omen/demographics/turnout.ts · computeTurnoutPropensity(d, ctx)

03 · Voter ideology

Demographic partisan lean on a −1 (far left) to +1 (far right) axis is computed from education, race/ethnicity, age, sex, and income — each factor a small additive term anchored in post-2016 ANES-reported partisan correlations. Fixtures declare each candidate's ideology on the same axis (Bowman −0.65 / Latimer −0.15 in NY-16; Cruz +0.80 / O'Rourke −0.55 in TX 2018).

Soft Downsian proximity routing assigns each voter a preferred candidate with probability ∝ 1 / (|ideology − lean| + 0.35). Smoothing keeps moderates from flipping deterministically. Applied to partisan_base, demographic_ bloc, cross_pressured segments; swing + low_info stay noise- driven on purpose.

source: src/lib/omen/politics/population.ts · applyIdeologyProximity(...)

04 · Four engines

Each race runs through four independent forecasters:

  • Fundamentals: pregame prior + 3pp incumbency bump. Confidence scales with margin.
  • Polling aggregate: recency-decayed (half-life 14d), √n sample-size weighted, LV > RV > A population weighted. Pollster house effects subtracted per POLLSTER_BIASES (31 pollsters curated from the 538 Pollster Ratings archive + Silver Bulletin 2024 ratings; fallback neutral bias for unrecognized pollsters). Cross-party only.
  • Synthetic electorate: the ACS-sampled population runs through the event stream with segment-specific belief-update rules. Shares at election day become the engine output.
  • Event / sentiment: directional shift from recent events (trailing 14 days) applied on top of pregame prior. Capped at 0.6 confidence.
source: src/lib/omen/politics/engines/

05 · Ensemble

Engine outputs combine via confidence-weighted average with race-type-specific base weights:

Race typePollingFund.Synth.Event
primary25%20%35%20%
general35%25%25%15%
runoff30%30%25%15%
special25%35%25%15%
(default)35%25%25%15%

Each engine's effective weight is its base weight × its own reported confidence. The combined forecast then passes through 15% shrinkage toward uniform — keeping us honest until we've resolved enough live predictions to earn tighter confidence.

source: src/lib/omen/politics/engines/ensemble.ts · combineEngines()

06 · Event scoring

Live-use events (news stories, debates, scandals, polls) are scored via Claude Haiku against a published rubric. Rubric defines valence (−1..+1) and magnitude (0..1) anchors so any scorer — human or LLM — produces consistent numbers. Regression-tested on 54 hand-authored fixture events: 85% type agreement, 87% subject agreement, mean |Δvalence| 0.31, mean |Δmagnitude| 0.14.

OMEN Event Scoring Rubric (v1)

Every political event is scored on two dimensions, plus a type:

TYPE (pick one):
- poll_release: a published poll showing a meaningful shift
- endorsement: a named individual or org publicly backing a candidate
- debate: a scheduled candidate-vs-candidate event
- scandal: a damaging story about a candidate
- news_event: everything else meaningful (speech, position change, major
  policy announcement, viral moment, court ruling affecting the race)
- attack_ad: a significant ad buy targeting a candidate
- economic_data: an economic indicator release that shifts the mood
- election_day: the actual vote (only one per race, must be last)

VALENCE (-1 to +1): signed effect on the SUBJECT CANDIDATE named.
  +1.0  maximally helpful to them (e.g., a unanimous positive endorsement
        from a cross-cutting figure, a widely-praised debate performance)
  +0.5  clearly positive (Obama endorses a Dem)
  +0.2  mildly positive
   0    neutral
  -0.2  mildly negative
  -0.5  clearly damaging (Kavanaugh-level controversy for an R)
  -1.0  catastrophic (criminal indictment, withdrawal-level)

MAGNITUDE (0 to 1): how much coverage + attention this commanded.
  0.1   niche story, traded in one news cycle
  0.2   standard story (most events)
  0.3   A-block on cable news for a day
  0.5   top story for a week
  0.7   re-writes the race narrative
  1.0   election-day level (reserved for election_day)

Magnitudes default to EVENT_TYPE_PRIORS[type]. Override when the event
is clearly bigger or smaller than a typical instance of its type
(e.g., Access Hollywood tape is "scandal" with magnitude 0.7+, not 0.5).

SUBJECT CANDIDATE (required unless election_day):
The candidate the event is primarily about. Some events hit both sides
— pick the candidate who is the main subject (e.g., a debate is typically
scored about the weaker performer; a scandal is about the target).
source: src/lib/omen/politics/engines/event-scoring.ts

07 · Uncertainty

Every live pre-registration runs 100 Monte Carlo trials with per-trial synthetic seed + bootstrap-resampled polls. Output includes per-candidate p10/p50/p90 calibrated shares, P(wins), and margin distribution. Point estimates alone don't give you what you need to size a Kalshi position — these do.

Known limitation: we perturb sampling variance (seed + poll bootstrap), not model specification. True uncertainty includes "is our model right at all" — harder to quantify, requires model ensembling across different structural specs.

source: src/lib/omen/politics/engines/uncertainty.ts · runHybridForecastWithUncertainty(fixture, input)

08 · The honest number

Every backtest is run with information frozen at T−30 days before election. We only use polls with fieldEnd ≤ cutoff and events with daysBeforeElection ≥ 30. That's the forecast OMEN would have made at the 72-hour pre- registration window — the number we're graded by.

Current honest T−30: 2.24pp mean share err, 7/9 winners correct across 9 fixtures.

source: src/lib/omen/politics/engines/forecast.ts · holdoutDays option