Objective
Lock the medium-tier refiner that sits between the F1 cheap screen (basin-finder-02-f1-cheap-screen) and the strong 6D benchmark. The refiner must close the F1→benchmark ranking gap (F1-0b is near-random on F1-promoted candidates: Spearman 0.05) at a runtime that lets the cascade actually use it.
Setup
- Model / parameterization: 54-D constrained MSRE in phi-space.
- Evaluator: 5 candidate F2 screens (varying mc_R in {100, 300}, sparse vs full time-points, pure MC vs F1/MC hybrid F2-5).
- Comparator: F2-0 (no F2 — direct F1→benchmark promotion) and the strong benchmark
hamilton_loglik_cap_mcat mc_R=1000, mc_H_burn=150. - Acceptance rule: runtime hard cap 3 s/point (“not-F2” threshold 5 s), Spearman vs benchmark, top-K recall at promotion budget 25.
Procedure
- Evaluate all candidates on the top-100 of the F1-conditioned 1000-point pool.
- 3-seed robustness sweep: pairwise Spearman 0.994-0.995, top-25 overlap 92-96%.
- Pool-depth audit extension: re-score on top-200 to check that the result is not a top-100 artifact.
Results
- F2-1 (mc_R=100, full-sample): Spearman 0.996, top-10 recall 1.00, top-25 recall 0.96, median 2.24 s/point — meets the 3 s cap; F2-2 (mc_R=300) is at 0.999 Spearman but 6.72 s/point and fails the cap.
- F2 is essential, not optional: the no-F2 baseline F2-0 captures only 40% of benchmark top-10.
- mc_R=100 vs mc_R=300: ranking difference 0.996 vs 0.999 — extra MC paths are wasted compute at the F2 tier.
- Sparse-time variants offer no speed advantage (MC rebuild dominates runtime).
- Signal-blending hybrid F2-5 degrades performance (mixing near-zero F1 signal with strong F2 dilutes both).
- Audit: fragility audit reproduces all primary metrics; alt-methods script extension confirms top-200 robustness (Spearman 0.981, perfect recall). Both original caveats (seed sensitivity, narrow pool) closed by extended robustness checks. Auditor caught a
invperm(sortperm(sortperm(x)))Spearman implementation bug; recall metrics were unaffected.
Analysis
The locked F2-1 closes the F1→benchmark ranking gap at ~10x the F1 cost but still ~10x cheaper than the benchmark. Strong-eval savings of 89.7% means the cascade can actually afford to use it. The runtime tail (max 5.32 s/point on outliers) is the only operational caveat — flagged for Experiment 04’s promotion policy.
Claim updates
- forward-solution-unique-fundamental-ree-with-nbc: tested_by, strength moderate. F2-1 ranks parameter vectors via the full 6-D forward-solution likelihood with MC cap rates and matches the strong benchmark at top-K, validating that the forward-solution evaluator can be used as a refiner stage in a cascade.
- asset-pricing-no-bubble-condition-regime: tested_by, strength moderate. The refiner uses MC cap-rate pricing inside the no-bubble feasibility region; high Spearman is empirical evidence that the NBC-feasible interior is well-ordered by the production likelihood.
Follow-up
- Operational caveat: runtime tail (5.32 s outliers) for Exp 04’s promotion-policy design.
- Promotes F2-1 settings (mc_R=100, mc_H_burn=150, crisis_range 57:70, crisis_scale 10) into basin-finder-04-promotion-policy.
- See also: basin-finder-complete-program-summary.