Final Day Sprint Synthesis — Exp 10/11/12 Closures and Production Freeze Decisions

Problem

Between 2026-04-11 and 2026-04-12 three experiments closed (Exp 10, 11, 12) that were originally intended to deliver operational improvements for the production pipeline. None of them did. Before freezing the pipeline, someone needs to say clearly what the three closures did establish, what they removed from the option menu, and what changed versus the 2026-04-10 design-brief-global-optimization-pipeline.

Key idea

The final-day sprint removed alternatives rather than opening better ones. The production conclusion is now more conservative but more stable: keep the hybrid Hamilton-Kim + MC cap-rate objective, absorbing NBC, the 54-D constrained parameterization, and BOBYQA multi-start + paired high-R verification; do not ship adaptive splice logic, strict θ-only tail caching, or Exp 11 curvature output for restart scaling. The single cheap follow-up still worth doing before freeze is false-basin classification.

The sprint also strengthened the interpretation that the main unresolved approximation issue is the tail operator itself (N3: η_H · r/(1-r) with a single global r), not splice timing, not θ-only caching, and not curvature-based restart scaling.

Method

For each of Exp 10 / 11 / 12, the synthesis asks:

What was decided?
What was tested?
What did we learn?
What is the pipeline implication?

Then Section 2 cross-cuts across the three to compare against the pre-sprint synthesis (design-brief-global-optimization-pipeline) — what changed materially, what did not change, and what new open questions appeared.

Results

Exp 10 (strict θ-only hybrid tail cache) → KILL_NEGATIVE_LEVERAGE. The strict θ-only operator cache inside the hybrid evaluator is bit-identical to the fixed-burn baseline, preserves ranking, and gives no measurable wall-time gain (mean speedup 1.0006, median 0.9956). The dominant cost remains in seed-dependent objects (decay-table build, Q-table sweep). A latent cache-contract hazard is logged: MCDynamicsCache is not actually a safe $θ^{54}$ -only object if $m_{g}$ differs across compatible $θ^{55}$ expansions; any future cache layer must key on something stronger than the current $θ^{54}$ hash. See exp-10-hybrid-tail-cache-theta-only.

Exp 11 (block-relative curvature at incumbent) → DIAGNOSTIC_ONLY + passes_with_major_fragility. Block-relative deterministic curvature at ep06c_polished on eval_nll_hamilton gives top-2 stiff directions in the Monetary means block that are stencil-robust, but the Auditor’s central-FD pipeline revealed a spec-level forward-FD diagonal stencil bug: the formula H[i,i] = 2(f_+ − f_0)/h² is the gradient-free stencil valid only at a critical point, and ep06c_polished is polished on eval_nll_sim not eval_nll_hamilton, so the contamination term $2\nabla f_{i} / h_{i}$ reaches $\sim 3.8 \times 1 0^{8}$ in the Monetary means block. Ranks 3–5 are sign-flipped under central FD; rank-5 block dominance drops 1.000 → 0.396. rotation_scaling.json is not fit for restart preconditioning. See exp-11-curvature-incumbent-scaled-diagnostic.

Exp 12 (per-period integer online onset detection) → KILL_DETECTOR_FAILS. Phase A Stage-0 oracle sweep fires PROCEED_TO_PHASE_B at $R = 250$ , but the Auditor finds the PROCEED is seed-conditional (2/6 fresh seeds KILL G3, and the Builder’s seed is the max of 6) and $R$ -conditional (G3 collapses 11.4× from $R = 250$ to $R = 2000$ , ~4× faster than the $8$ MC-noise-floor shrinkage, gate flips PROCEED → KILL between $R = 1000$ and $R = 2000$ ). The pre-registered R3 mid-Phase-B checkpoint fails C1 on both seed schedules at $R = 500$ (O1/G09 median $E_{l o g Q}$ ratio 1.107 / 1.080, i.e. O1 is 8–11% worse than G09 on pricing error), even though O1 is ~10% faster, 10× smoother on chatter, and 10× lower on fallback rate. Root cause: the pre-registered calibration objective is ~180× dominated by the oracle-alignment term, implicitly turning O1 into an oracle-matcher rather than an $E_{l o g Q}$ -minimizer. See exp-12-online-asymptotic-onset-detection.

Material changes vs pre-sprint synthesis.

Tail misspecification is now the leading unresolved approximation hypothesis. Exp 09 showed splice timing matters; Exp 12 shows finer onset detection does not rescue the approximation where it matters.
Strict θ-only caching is no longer a live production idea.
Curvature-based restart scaling is no longer a live production idea (until a clean central-FD follow-up 11b is run).
Four new gotchas logged from Exp 12 alone: oracle-distance penalty silently dominating a mixed calibration objective (180× ratio); chatter/flip-rate metrics must be well-defined on the comparator arm first; gate thresholds set within ~1σ of seed-wise noise floor flip on seed resample; fine-grid argmin exploits finite- $R$ MC noise against a coarse-grid argmin. Three gotchas from Exp 11: gradient-contaminated FD stencil, cross-objective non-criticality, same-stencil “verifier independence” is tautological.

Limitations

Single-author, single-day snapshot. Not yet cross-reviewed by a librarian or auditor session; the synthesis is a “what we know now” memo, not a referee report.
Does not resolve the 154-nat MLE−DGP gap. Still outside scope.
N3 operator-floor hypothesis is the leading interpretation, not a claim. Exp 12 does not prove the tail operator is the bottleneck; it rules out onset detection as a rescue.

Open questions

Does the geometric tail operator η_H · r/(1-r) with a single global r under-fit the state-dependent decay geometry? Testing this is operator-floor null N3, not yet opened.
Should false-basin classification (label-symmetry / gap-of-gaps) be run pre-freeze, and if so under what decision rule?
Is there a clean central-FD curvature rerun (Exp 11b) that would restore rotation_scaling.json to operational preconditioning?

My take

The value of this memo is exactly that it removes rather than adds. Three experiments that all had plausible upsides at the time of the design brief closed without producing any, and the memo is the one place where the removals are written down before they fade into undocumented tribal knowledge. It is the natural companion to design-brief-global-optimization-pipeline — the brief gives the pre-sprint menu; this memo gives the post-sprint cuts.

design-brief-global-optimization-pipeline — the pre-sprint synthesis this memo updates.
exp-10-hybrid-tail-cache-theta-only — the KILL_NEGATIVE_LEVERAGE result.
exp-11-curvature-incumbent-scaled-diagnostic — the DIAGNOSTIC_ONLY + fragility result.
exp-12-online-asymptotic-onset-detection — the KILL_DETECTOR_FAILS result.
exp-09-adaptive-tail-splice-mc-pricing — the splice-timing predecessor that Exp 12 attempted to improve upon.
david-leather, jacob-sagi — project PIs.

LeatherSagiKnowledgebase

Explorer

Final Day Sprint Synthesis — Exp 10/11/12 Closures and Production Freeze Decisions

Problem

Key idea

Method

Results

Limitations

Open questions

My take

Graph View

Table of Contents

Backlinks

LeatherSagiKnowledgebase

Explorer

Final Day Sprint Synthesis — Exp 10/11/12 Closures and Production Freeze Decisions

Problem

Key idea

Method

Results

Limitations

Open questions

My take

Related

Graph View

Table of Contents

Backlinks