Objective
Test whether an adaptive global driver (D2: elite-local resampling on top of the GBT-Sobol archive) closes the gap to the incumbent on the 54-D constrained MLE, vs the static baselines D0 (Exp 01-04 cascade) and D1 (D0 with a 2x wider archive).
Setup
- D0: GBT-Sobol → F1-0b → F2-1 (mc_R=100) → P0 top-25 → BOBYQA local optimization (maxeval=600).
- D1: same pipeline with ~5,000 feasible target — tests whether more search volume helps.
- D2: D0 initial archive + elite-local resampling (70% elite, 30% background, sigma_local=0.05 in phi-space, 500K proposals); expanded archive rescored through F1/F2/P0.
- Comparator (incumbent): 5-seed mean NLL = -4956.0 under R=100, mc_H_burn=150.
- Acceptance rule: endpoint within 100 nats of incumbent (5-seed averaged); secondary 150-nat threshold.
Procedure
- 3 campaign seeds per driver → 9 total campaigns → 225 BOBYQA local-optimization endpoints.
- Track best 5-seed NLL, distinct basins (epsilon=0.50), feasibility rate, wall-hours.
Results
- All 9 campaigns: 0/3 hit-rate at both 100-nat and 150-nat thresholds across all 3 drivers.
- Best 5-seed NLL: D0 -4745.5 (gap 210 nats), D1 -4631.3 (gap 325), D2 -4757.3 (gap 199 nats) — D2 best, but still far from incumbent.
- D2 adaptive mechanism mechanically works: 24-29% feasibility on elite-local proposals vs 0.27% background (~100x enrichment), 86-103K additional feasible points.
- D2 median 5-seed NLL is dramatically better (-4292 vs D0 -2449), but best endpoint never reaches incumbent.
- All 225 endpoints land in distinct basins (epsilon=0.50) — landscape is extremely multi-modal.
- Wall-clock: D0 55.8h, D1 68.1h, D2 56.4h.
- Audit: referee2 audit reproduces all metrics; fragility audit + break-this-result both confirm null result is structurally robust. The 100-nat threshold is ~1.4 sigma of single-seed MC noise (~74 nats), reduced to ~0.6 sigma with 5-seed averaging.
Analysis
D1 (more search volume) doesn’t help and D2 (concentrated adaptive search) gets closer but doesn’t close the gap — the bottleneck is local optimization depth, not basin discovery. The 600-eval BOBYQA budget is insufficient to descend to the global basin from any of the 25 starts. Two competing hypotheses survive: (1) the incumbent basin’s attraction region is too narrow for a 25-start campaign to find, or (2) per-eval MC rebuild creates a noisy loss surface that BOBYQA can’t navigate. CRN-within-BOBYQA and deeper maxeval are the two natural follow-ups.
Claim updates
- forward-solution-unique-fundamental-ree-with-nbc: tested_by, strength weak. The null result does not contradict the forward-solution uniqueness claim — it only shows that finding the global optimum from cold starts is bottlenecked by local-opt depth, not by failure to identify a unique basin in principle.
Follow-up
- D0 retained as the locked global driver (per pre-registered decision rule).
- Open: deeper BOBYQA (maxeval ~3000), CRN within optimizer, deep-from-D2-best single-start experiment.
- Feeds the design brief at
research/synthesis/global_opt_pipeline_brief.md. - See also: basin-finder-complete-program-summary, sim-recovery-experiment-local-global.