Problem

The Leather–Sagi project needs a formal, documented global optimization pipeline that codifies the empirical lessons of the recent ~3-week arc into a reproducible recipe. The arc began with the discovery that programs 06 and 07 had been running on real data rather than the simulated DGP; once that was corrected, both programs re-opened and the numerical landscape changed materially. Before freezing a pipeline there is a menu of design choices that depend on empirical findings, not just taste, and those findings need to be assembled in one place.

Key idea

This brief is the one-place assembly. It is written for an external scientific reasoner (ChatGPT 5.4 Pro) who does not have the codebase in front of them, and its single purpose is to bring them up to speed on what has been learned, what has been decided, and what design questions are still open. It is not a project summary and it is not a writeup — it is a design input.

Method

The brief is structured around:

  • Section 1. Where we are: the recent-arc timeline, the numerically credible facts (incumbent SEM, DGP oracle , MLE−DGP gap 154 nats at ), and what CLAUDE.md still says that is now stale.
  • Section 2. Five design choices the user has already made (D1–D5), each with the empirical evidence and the one open question that must be closed before freeze:
    • D1 remove the crisis measurement-error inflation (crisis_scale = 10 on t ∈ 57:70)
    • D2 keep absorbing NBC as the production feasibility gate (vs the deprecated is_feasible_pixg_cap)
    • D3 keep the 54-D m_g IS-stationarity constrained parameterization
    • D4 keep BOBYQA as the inner optimizer
    • D5 keep the paired-seed high-R tie-break protocol for sub-nat ranking
  • Section 3. Three more design choices the user wants ChatGPT to advise on: restart allocation policy (Exp10 late-bloom non-monotonicity), Phase B R schedule, and whether to include a false-basin classifier (gap-of-gaps / label-symmetry).
  • Section 4. Eleven additional empirical lessons the user did not list but that the arc established (e.g., RBPF std scaling at ; 116× MC lookup speedup with per-eval rebuild mandatory; Hamilton filter 2 ms but 1572-nat objective gap; CRN variance reduction only 1.4–1.6×).
  • Section 5. Open questions for ChatGPT.

Results

  • The brief is itself the result — not a numerical result, but a consolidated input for pipeline design.
  • It documents the state at time of writing (2026-04-10). Exp 10 / 11 / 12 closed between then and 2026-04-12 and tightened the picture further (see synthesis-final-day-sprint-decisions).
  • Pre-freeze decisions anchored by the brief: D1–D5 are committed; false-basin classification remains the only cheap follow-up worth doing before freeze.
  • The brief explicitly declines to resolve the 154-nat MLE−DGP gap; it flags the gap as sample-specific rather than search-specific and leaves the finite-sample / weak-identification question for a separate project.

Limitations

  • Written without codebase access for the reader. File paths are cited for follow-up only; the brief cannot substitute for reading the canonical memos.
  • Snapshot dated 2026-04-10. Exp 10 / 11 / 12 outcomes were not yet incorporated; synthesis-final-day-sprint-decisions is the correction.
  • Single audience. The style is optimized for a single external reasoner; it is too dense for a new collaborator on the project.

Open questions

Lifted from Section 5 of the brief, not yet closed by this or any later artifact:

  • What is the right restart allocation rule (uniform vs greedy top-K vs diversity-weighted) given the Exp 10 late-bloom non-monotonicity?
  • What is the right Phase B R schedule (R=500 is 100% hit rate but is still computationally expensive across the elite shelf)?
  • Is label-symmetry / gap-of-gaps a reliable false-basin classifier on a different panel or DGP draw?
  • Which, if any, of the eleven “additional lessons” should the pipeline enforce as hard invariants vs document as soft guidance?

My take

This brief is the most load-bearing document in the current arc. It is the input to the pipeline design and the reference against which subsequent pipeline experiments should be interpreted. It works best when read before any individual experiment writeup — the writeups fill in the mechanics; the brief gives the motivation for what mechanics were chosen.

The companion external artifact is leather-sagi-constrained-optimization-benchmark-54d, which is the same problem packaged for the optimization research community (Mykel Kochenderfer’s group specifically) rather than for an internal design question.