Objective

Frame and execute a complete cascade pipeline for the 54-D constrained MLE, with each stage populated by a defended design choice from a separate Tier-A/B experiment. The program ran from 2026-03-26 to 2026-03-29 across 5 experiments and produced a locked end-to-end pipeline plus a definitive verdict on whether the global-optimization gap closes within the chosen evaluator family.

Setup

  • Five sub-experiments, each with its own Builder/Verifier/Auditor sessions and pre-registered acceptance rule.
  • Locked stack at completion: GBT-guided Sobol archive (Exp 01) F1-0b 3D Hamilton-Kim + BC Riccati screen (Exp 02) F2-1 6D Hamilton + MC cap rates at mc_R=100 (Exp 03) P0 score-only top-N promotion (Exp 04) BOBYQA local optimization at maxeval=600 (Exp 05).
  • Comparator: BOBYQA multi-start with no cascade; production incumbent at 5-seed mean NLL = -4956.0 under R=100, mc_H_burn=150.

Procedure

  • Exp 01: train and validate the GBT feasibility classifier.
  • Exp 02: lock the F1 cheap screen at 0.9 ms/eval.
  • Exp 03: lock the F2 medium refiner at 2.2 s/eval, Spearman 0.996.
  • Exp 04: stress-test 28 promotion-policy configurations P0 unbeatable.
  • Exp 05: 9 campaigns x 25 starts = 225 BOBYQA endpoints across 3 drivers (D0/D1/D2).

Results

  • Exp 01: 19.6x archive speedup at 93.4% recall (PASS).
  • Exp 02: 1550x cheap-screen speedup, Spearman 0.96, top-K recall identical to MC baseline (PASS).
  • Exp 03: F2-1 closes the F1benchmark ranking gap from Spearman 0.05 to 0.996, 89.7% strong-eval savings, 100% top-10 recall at promotion budget 25 (PASS).
  • Exp 04: 28 policy configurations vs P0 no improvement; diversity constraints actively hurt at concentrated cluster structure (NULL).
  • Exp 05: 0/225 hits at 100- or 150-nat threshold; D2 adaptive driver delivers ~100x feasibility enrichment and best 5-seed NLL -4757.3 (gap 199 nats from incumbent), but never reaches it (NULL).
  • Audit: each sub-experiment carries its own audit chain. The cascade pipeline is structurally locked; the global-optimization NULL is the binding finding.

Analysis

The cascade architecture is validated: each stage is necessary (Exp 02 confirms cheap-screen is enough to forward elites; Exp 03 confirms F2 is essential because F1 is near-random within the shortlist) and each stage is locked at a defended design point. The two NULLs are themselves load-bearing findings: (i) when the evaluator surrogate already tracks the benchmark, promotion-policy sophistication is wasted (Exp 04), and (ii) the bottleneck on the 54-D constrained MLE is local-optimization depth from cold starts, not basin discovery — no amount of search coverage closes the 199-nat gap (Exp 05). This is the empirical foundation for the Recent Arc’s pivot to deeper-local-opt designs (Exp 06/07/10a producing ep06c_polished at -5381.59).

Claim updates

  • forward-solution-unique-fundamental-ree-with-nbc: tested_by, strength moderate. The full cascade exercise stress-tests forward-solution likelihood ranking quality from end to end and confirms the surrogate stack is well-ordered on the NBC-feasible region (Exp 02/03), even though global recovery from cold starts is bottlenecked by local-opt depth.
  • asset-pricing-no-bubble-condition-regime: tested_by, strength moderate. The GBT classifier (Exp 01) and the entire cascade restrict to the NBC-feasible region; high cascade ranking quality is empirical evidence that the spectral-radius NBC test is operational.

Follow-up