Exp 10a: Restart Allocation Baseline + High-R Tie-Break (Incumbent Promotion)

Objective

Run a systematic restart matrix to identify the best-known MLE candidate on the simulated DGP panel, resolve sub-nat ranking via paired R=100k tie-break, and determine whether the incumbent basin is truth-connected.

Setup

Model: 54 free parameters, constrained MSRE, absorbing NBC proxy
Evaluator: BOBYQA (maxeval=600, R=100, H=150), re-evaluated at R=100k for tie-break
Comparators: 4 candidates (ep06c_polished, truth600_best, truth2000_s137, theta_true)
Acceptance: stable paired ranking across 5 seeds; truth-connectivity check

Procedure

25-elite x 3-seed restart matrix (75 BOBYQA runs, R=100, maxeval=600)
Truth-start follow-up: truth_600 + truth_2000_noftol + R=10k re-eval
Paired R=100k tie-break: 4 candidates x 5 shared MC seeds (42, 137, 271, 1001, 2002)
Within-seed contrasts to cancel MC noise (~26-35 min/eval, 24 threads)
Late-bloom analysis across elite ranks

Results

ep06c_polished: NLL -5381.59 +/- 0.15 SEM (R=100k), best in all 5 paired seeds
Paired contrasts: ep06c_polished beats truth600_best by -1.13 nats (SEM 0.21, t=-5.29), truth2000_s137 by -1.70 nats (SEM 0.22, t=-7.60)
truth2000_s137’s earlier ~6-nat lead was a mixed-fidelity artifact (comparing trace minimum at R=100 vs re-eval at R=10k)
MLE-DGP gap: 154.4 nats at R=100k (sample displacement, not search failure)
Restart matrix: best -5339.88, median -5129.1, worst -4876.04 (75 runs, 0 corrupt)
Late-bloom finding: worst-screening elites 6/7 produced single largest improvements (~900-1200 nats)
Improvement non-saturating: median gain 300-to-600 evals (~53 nats) comparable to earlier intervals
Audit: not yet performed (verdicts from verification session, not independently audited)

Analysis

Pairing dramatically increases ranking power: truth2000_s137 vs truth600_best paired SEM is 0.013 nats vs marginal SEMs of 0.34 nats (26x shrinkage). The basin is truth-connected: descent from theta_true reaches the ep06c_polished neighborhood within 1.7 nats. The 154-nat MLE-DGP gap is the limiting question — it is sample-MLE displacement (finite-sample T=119 with 54 free parameters), not a search failure, since both truth-started and elite-started paths converge to the same basin. The late-bloom finding (non-monotone improvement in elite rank) argues against greedy top-k restart allocation.

Claim updates

forward-solution-unique-fundamental-ree-with-nbc: Truth-connectivity of the incumbent basin (local descent from theta_true reaches ep06c_polished within 1.7 nats) provides strong evidence that the forward-solution MLE is locally unique on the simulated panel; paired R=100k tie-break resolves sub-nat ranking.

Follow-up

154-nat MLE-DGP gap: sample-MLE displacement analysis (finite-sample vs weak identification)
Uniform vs greedy restart allocation comparison (pre-register for Exp10b)
Audit session needed for independent verification

LeatherSagiKnowledgebase

Explorer