Objective

Document the data-source correction and BOBYQA configuration changes between the stale v1 and corrected v2 runs of Program 06 (Last-Mile Probe), establishing that v1 results are invalidated and v2 can be trusted.

Setup

Two compounding changes between v1 and v2:

  1. Data source: v1 used real observed data (no ground truth for gap measurement); v2 uses simulated DGP with known theta_true
  2. BOBYQA configuration: v1 had ftol_abs=2.0 and initial_step=0.02 (premature convergence at ~112 evals); v2 uses ftol disabled, initial_step=0.06, xtol_abs=6e-6
  3. Additional: v2 uses absorbing spectral proxy (99.1% vs 63.3%), 3 starts vs 1, tighter success threshold (20 vs 100 nats)

Procedure

  • Side-by-side comparison of v1 vs v2 configurations and results
  • Head-to-head matching on (start, seed) pairs where possible
  • Reversal analysis: did conclusions change direction

Results

  • v1: 0% hit rate (all 3/3 runs stopped at FTOL after ~112 evals, gaps +157 to +183 nats)
  • v2 06a: 75% hit rate (6/8 within 20 nats, maxeval 3000, R=100)
  • v2 06b: 100% hit rate (7/7, maxeval 3000, R=500)
  • v1 conclusions fully reversed: “depth does not help” becomes “depth helps”; “R=500 does not help” becomes “R=500 eliminates all traps”
  • The v1 bottleneck was ftol_abs=2.0, not the optimizer or the model

Analysis

This is a comparison memo, not a standalone numerical experiment. The two bugs (wrong data, premature stopping) compounded: on real data there was no ground truth to detect the ftol problem, and the ftol problem prevented meaningful optimization on any data. The lesson is that simulation-study validation requires actually using simulated data, and BOBYQA ftol must be disabled when the objective is noisy (MC likelihood). See exp-06-last-mile-probe-v2-corrected for the full v2 results.

Claim updates

Follow-up

  • All downstream experiments use v2 configuration and simulated DGP by default
  • USE_REAL_DATA=true flag for real-data runs