Problem
The Leather–Sagi project needs a formal, documented global optimization pipeline that codifies the empirical lessons of the recent ~3-week arc into a reproducible recipe. The arc began with the discovery that programs 06 and 07 had been running on real data rather than the simulated DGP; once that was corrected, both programs re-opened and the numerical landscape changed materially. Before freezing a pipeline there is a menu of design choices that depend on empirical findings, not just taste, and those findings need to be assembled in one place.
Key idea
This brief is the one-place assembly. It is written for an external scientific reasoner (ChatGPT 5.4 Pro) who does not have the codebase in front of them, and its single purpose is to bring them up to speed on what has been learned, what has been decided, and what design questions are still open. It is not a project summary and it is not a writeup — it is a design input.
Method
The brief is structured around:
- Section 1. Where we are: the recent-arc timeline, the numerically
credible facts (incumbent SEM, DGP oracle ,
MLE−DGP gap 154 nats at ), and what
CLAUDE.mdstill says that is now stale. - Section 2. Five design choices the user has already made (D1–D5), each
with the empirical evidence and the one open question that must be closed
before freeze:
- D1 remove the crisis measurement-error inflation
(
crisis_scale = 10ont ∈ 57:70) - D2 keep absorbing NBC as the production feasibility gate (vs the
deprecated
is_feasible_pixg_cap) - D3 keep the 54-D
m_gIS-stationarity constrained parameterization - D4 keep BOBYQA as the inner optimizer
- D5 keep the paired-seed high-R tie-break protocol for sub-nat ranking
- D1 remove the crisis measurement-error inflation
(
- Section 3. Three more design choices the user wants ChatGPT to advise on: restart allocation policy (Exp10 late-bloom non-monotonicity), Phase B R schedule, and whether to include a false-basin classifier (gap-of-gaps / label-symmetry).
- Section 4. Eleven additional empirical lessons the user did not list but that the arc established (e.g., RBPF std scaling at ; 116× MC lookup speedup with per-eval rebuild mandatory; Hamilton filter 2 ms but 1572-nat objective gap; CRN variance reduction only 1.4–1.6×).
- Section 5. Open questions for ChatGPT.
Results
- The brief is itself the result — not a numerical result, but a consolidated input for pipeline design.
- It documents the state at time of writing (2026-04-10). Exp 10 / 11 / 12 closed between then and 2026-04-12 and tightened the picture further (see synthesis-final-day-sprint-decisions).
- Pre-freeze decisions anchored by the brief: D1–D5 are committed; false-basin classification remains the only cheap follow-up worth doing before freeze.
- The brief explicitly declines to resolve the 154-nat MLE−DGP gap; it flags the gap as sample-specific rather than search-specific and leaves the finite-sample / weak-identification question for a separate project.
Limitations
- Written without codebase access for the reader. File paths are cited for follow-up only; the brief cannot substitute for reading the canonical memos.
- Snapshot dated 2026-04-10. Exp 10 / 11 / 12 outcomes were not yet incorporated; synthesis-final-day-sprint-decisions is the correction.
- Single audience. The style is optimized for a single external reasoner; it is too dense for a new collaborator on the project.
Open questions
Lifted from Section 5 of the brief, not yet closed by this or any later artifact:
- What is the right restart allocation rule (uniform vs greedy top-K vs diversity-weighted) given the Exp 10 late-bloom non-monotonicity?
- What is the right Phase B R schedule (R=500 is 100% hit rate but is still computationally expensive across the elite shelf)?
- Is label-symmetry / gap-of-gaps a reliable false-basin classifier on a different panel or DGP draw?
- Which, if any, of the eleven “additional lessons” should the pipeline enforce as hard invariants vs document as soft guidance?
My take
This brief is the most load-bearing document in the current arc. It is the input to the pipeline design and the reference against which subsequent pipeline experiments should be interpreted. It works best when read before any individual experiment writeup — the writeups fill in the mechanics; the brief gives the motivation for what mechanics were chosen.
The companion external artifact is leather-sagi-constrained-optimization-benchmark-54d, which is the same problem packaged for the optimization research community (Mykel Kochenderfer’s group specifically) rather than for an internal design question.
Related
- leather-sagi-constrained-optimization-benchmark-54d — the external benchmark-paper companion to this internal brief.
- synthesis-final-day-sprint-decisions — Exp 10/11/12 updates that post-date this brief.
- global-optimization-pipeline-msre-cre — the research idea this brief exists to support.
- exp-07-optarch-10-restart-allocation-incumbent — source of the SEM incumbent.
- sim-recovery-experiment-local-global — source of the local-PASS / global-FAIL finding that motivated the whole arc.
- basin-finder-complete-program-summary — source of the GBT classifier, F1/F2 screening cascade, and BOBYQA-dominates finding.
- leather-sagi-markov-switching-cre-asset-pricing — the underlying CRE model.
- markov-switching-rational-expectations-cre-pricing — the model class.
- david-leather, jacob-sagi — project PIs.