Basin Finder 01: GBT-Guided Feasible Archive Generation

Objective

Establish whether the no-bubble feasible region of the 54-D MSRE parameter space is learnable well enough to use a cheap classifier as a pre-filter for Sobol-based archive generation. If so, the classifier replaces the expensive is_feasible_pixg_cap check on the vast majority of proposals and unblocks the rest of the basin-finder cascade.

Setup

Model / parameterization: 54-D constrained MSRE in phi-space, sampled by Sobol around theta_true with a +/-5% box.
Evaluator: GBT classifier (sklearn GradientBoostingClassifier, max_depth=6, n_estimators=500) trained on 60/20/20 split of Sobol-rejection labels.
Comparator: plain Sobol with is_feasible_pixg_cap evaluated on every proposal.
Acceptance rule: >=95% recall at the chosen threshold, no diversity collapse vs baseline (kmeans cluster count, NN distance), no top-decile NLL degradation on a downstream 3D Hamilton screen.

Procedure

Phase 1: 1M Sobol proposals, 2,516 feasible (0.252%) — labels for classifier training.
Phase 2: train GBT vs Random Forest vs Logistic Regression; GBT wins on validation AUC=0.9974, AP=0.9664; threshold 0.38 chosen at 95% recall breakpoint.
Phase 3: 5 independent 200K-proposal held-out batches; guided method scores all proposals with the GBT and only evaluates the top ~0.8% with the real feasibility check.
Phase 4: diversity guardrail (k=20 kmeans + nearest-neighbor distances) and downstream check (3D Hamilton NLL on matched-size samples).

Results

Wall-hour speedup: 121,497 +/- 11,861 vs 6,210 +/- 682 feasible/wall-hr = 19.6x (baseline finds 2,564 feasible across 5 batches; guided finds 2,394 = 93.4% of baseline).
Diversity guardrail: 20/20 kmeans clusters preserved, median NN distance 2.061 vs 2.064 (1.00x).
Downstream guardrail: best 3D Hamilton NLL identical at -3531.29; top-decile degradation 0.8 percentage points.
Audit: fragility audit (2026-03-26) reproduces speedup arithmetic exactly, 30/30 spot-check of feasibility labels passes, no train/test leakage. Open fragility risks: classifier is region-specific (+/-5% box only) and the diversity test is tautological by construction (guided is a subset of baseline’s Sobol stream).

Analysis

The 19.6x speedup comes from evaluation reduction (112x fewer feasibility checks), not from parallelization. The 6.6% miss rate is acceptable because the missed points are not systematically clustered. The result establishes that the feasible region around theta_true is smooth enough for a 54-D GBT to learn — important infrastructure for the rest of the basin-finder cascade. The transferability open question is what blocks promotion of the classifier as a region-agnostic gate: if the search box moves, retraining is required.

Claim updates

forward-solution-unique-fundamental-ree-with-nbc: tested_by, strength moderate. The 19.6x speedup demonstrates that no-bubble feasibility (a necessary condition for the forward solution to exist) is learnable and can be pre-screened cheaply on the relevant 54-D parameter region.
asset-pricing-no-bubble-condition-regime: tested_by, strength moderate. The classifier was trained directly on labels from is_feasible_pixg_cap, which is the production NBC test; the high AUC (0.997) is empirical evidence that the NBC region has smooth, learnable boundaries.

Follow-up

Open question: classifier transferability to other basins / box sizes (relevant for basin-finder-05-global-driver).
Sets up the cascade for basin-finder-02-f1-cheap-screen (cheap screen on classifier-passed candidates).
See also: basin-finder-complete-program-summary.

LeatherSagiKnowledgebase

Explorer