Objective

Test whether an adaptive global driver (D2: elite-local resampling on top of the GBT-Sobol archive) closes the gap to the incumbent on the 54-D constrained MLE, vs the static baselines D0 (Exp 01-04 cascade) and D1 (D0 with a 2x wider archive).

Setup

  • D0: GBT-Sobol F1-0b F2-1 (mc_R=100) P0 top-25 BOBYQA local optimization (maxeval=600).
  • D1: same pipeline with ~5,000 feasible target — tests whether more search volume helps.
  • D2: D0 initial archive + elite-local resampling (70% elite, 30% background, sigma_local=0.05 in phi-space, 500K proposals); expanded archive rescored through F1/F2/P0.
  • Comparator (incumbent): 5-seed mean NLL = -4956.0 under R=100, mc_H_burn=150.
  • Acceptance rule: endpoint within 100 nats of incumbent (5-seed averaged); secondary 150-nat threshold.

Procedure

  • 3 campaign seeds per driver 9 total campaigns 225 BOBYQA local-optimization endpoints.
  • Track best 5-seed NLL, distinct basins (epsilon=0.50), feasibility rate, wall-hours.

Results

  • All 9 campaigns: 0/3 hit-rate at both 100-nat and 150-nat thresholds across all 3 drivers.
  • Best 5-seed NLL: D0 -4745.5 (gap 210 nats), D1 -4631.3 (gap 325), D2 -4757.3 (gap 199 nats) — D2 best, but still far from incumbent.
  • D2 adaptive mechanism mechanically works: 24-29% feasibility on elite-local proposals vs 0.27% background (~100x enrichment), 86-103K additional feasible points.
  • D2 median 5-seed NLL is dramatically better (-4292 vs D0 -2449), but best endpoint never reaches incumbent.
  • All 225 endpoints land in distinct basins (epsilon=0.50) — landscape is extremely multi-modal.
  • Wall-clock: D0 55.8h, D1 68.1h, D2 56.4h.
  • Audit: referee2 audit reproduces all metrics; fragility audit + break-this-result both confirm null result is structurally robust. The 100-nat threshold is ~1.4 sigma of single-seed MC noise (~74 nats), reduced to ~0.6 sigma with 5-seed averaging.

Analysis

D1 (more search volume) doesn’t help and D2 (concentrated adaptive search) gets closer but doesn’t close the gap — the bottleneck is local optimization depth, not basin discovery. The 600-eval BOBYQA budget is insufficient to descend to the global basin from any of the 25 starts. Two competing hypotheses survive: (1) the incumbent basin’s attraction region is too narrow for a 25-start campaign to find, or (2) per-eval MC rebuild creates a noisy loss surface that BOBYQA can’t navigate. CRN-within-BOBYQA and deeper maxeval are the two natural follow-ups.

Claim updates

  • forward-solution-unique-fundamental-ree-with-nbc: tested_by, strength weak. The null result does not contradict the forward-solution uniqueness claim — it only shows that finding the global optimum from cold starts is bottlenecked by local-opt depth, not by failure to identify a unique basin in principle.

Follow-up