Objective
Test whether BOBYQA can close the last-mile NLL gap on simulated DGP data by increasing optimizer depth (06a: maxeval 600 to 3000) and MC resolution (06b: R=100 to R=500), validating the two-stage R protocol for future optimization campaigns.
Setup
- Model: 54 free parameters, constrained MSRE, absorbing spectral proxy (99.1% accuracy)
- Evaluator: hybrid Hamilton+MC likelihood with per-eval MC rebuild
- Comparators: DGP truth NLL baselines at R=100/500/5000
- Acceptance: gap to DGP truth ⇐ 20 nats, hit rate >= 2/3 starts
Procedure
- 06a: 3 starts x 3 MC seeds (9 runs), maxeval=3000, R=100, initial_step=0.06, ftol disabled
- 06b: same 3x3 grid but R=500, maxeval=3000
- 5-seed cross-evaluation at R=100 for all endpoints
- Head-to-head comparison on matching (start, seed) pairs
Results
- 06a (R=100): 6/8 hit rate (75%), best gap -49.0 nats, worst +35.5, median -13.7; 1 timeout
- 06b (R=500): 7/7 hit rate (100%), best gap -50.4 nats, worst -19.4, median -41.5; 2 timeouts
- 06b wins every head-to-head matchup; rescues both 06a misses (trial4/271: +35.5 to -32.7; trial5/271: +34.2 to -19.4)
- Most runs converge in 800-1600 evals, well within 3000 budget
- R=500 wall time ~11.5s/eval vs ~2.3s/eval at R=100 (5x cost justified by eliminating traps)
- No separate audit session
Analysis
R=500 is clearly superior to R=100: the smoother MC surface eliminates the ~25% local-trap rate caused by MC noise at R=100. The two R=100 misses are from the same seed (271), suggesting certain MC realizations create harder landscapes. The v2 BOBYQA configuration (initial_step=0.06, ftol disabled, xtol_abs=6e-6) fixes the premature convergence bug from v1 that stopped all runs after ~112 evals. The results validate the two-stage R protocol: R=100/500 for search, R=5000+ for polish.
Claim updates
- forward-solution-unique-fundamental-ree-with-nbc: 100% hit rate at R=500 from 3 independent starts demonstrates that the forward-solution MLE basin is reliably accessible by BOBYQA on the simulated DGP panel; the optimizer finds the NBC-feasible truth basin in all completed runs.
Follow-up
- 06c high-R polish: take best 06b endpoint to R=5000 (produced ep06c_polished)
- R=500 is the recommended MC resolution for BOBYQA optimization
- maxeval=3000 is more than sufficient (most converge in 800-1600 evals)