Objective
Document the data-source correction and BOBYQA configuration changes between the stale v1 and corrected v2 runs of Program 06 (Last-Mile Probe), establishing that v1 results are invalidated and v2 can be trusted.
Setup
Two compounding changes between v1 and v2:
- Data source: v1 used real observed data (no ground truth for gap measurement); v2 uses simulated DGP with known theta_true
- BOBYQA configuration: v1 had ftol_abs=2.0 and initial_step=0.02 (premature convergence at ~112 evals); v2 uses ftol disabled, initial_step=0.06, xtol_abs=6e-6
- Additional: v2 uses absorbing spectral proxy (99.1% vs 63.3%), 3 starts vs 1, tighter success threshold (20 vs 100 nats)
Procedure
- Side-by-side comparison of v1 vs v2 configurations and results
- Head-to-head matching on (start, seed) pairs where possible
- Reversal analysis: did conclusions change direction
Results
- v1: 0% hit rate (all 3/3 runs stopped at FTOL after ~112 evals, gaps +157 to +183 nats)
- v2 06a: 75% hit rate (6/8 within 20 nats, maxeval 3000, R=100)
- v2 06b: 100% hit rate (7/7, maxeval 3000, R=500)
- v1 conclusions fully reversed: “depth does not help” becomes “depth helps”; “R=500 does not help” becomes “R=500 eliminates all traps”
- The v1 bottleneck was ftol_abs=2.0, not the optimizer or the model
Analysis
This is a comparison memo, not a standalone numerical experiment. The two bugs (wrong data, premature stopping) compounded: on real data there was no ground truth to detect the ftol problem, and the ftol problem prevented meaningful optimization on any data. The lesson is that simulation-study validation requires actually using simulated data, and BOBYQA ftol must be disabled when the objective is noisy (MC likelihood). See exp-06-last-mile-probe-v2-corrected for the full v2 results.
Claim updates
- forward-solution-unique-fundamental-ree-with-nbc: v1 apparent optimization failures were data-source and configuration artifacts, not evidence against MLE accessibility on the NBC-feasible region.
Follow-up
- All downstream experiments use v2 configuration and simulated DGP by default
- USE_REAL_DATA=true flag for real-data runs