Objective

Build the complete robustness testing infrastructure needed to validate estimation results before publication: RBPF justification, variance stability, MCMC diagnostics, out-of-sample forecasts, model comparison (2 vs 4 regimes), MAP robustness, residual diagnostics, prior sensitivity, economic plausibility, and ESS bottleneck analysis.

Setup

  • Model: full 55-parameter CRE asset pricing model with RBPF likelihood
  • Evaluator: RBPF at N=3000 particles (~0.35 s/eval, std ~0.64)
  • Comparators: bootstrap PF (Task 1), 2-regime restricted model (Task 5)
  • Acceptance: all self-tests pass; infrastructure ready for full-run execution

Procedure

  • Task 1: RBPF vs bootstrap PF head-to-head (efficiency comparison)
  • Task 2: Variance stability across seeds and particle counts
  • Tasks 3, 6, 11: MCMC infrastructure (Gelman-Rubin, ESS, trace plots, posterior summaries)
  • Tasks 4, 5, 8, 10: Model fit (out-of-sample forecasts, 2-regime LRT/BIC, residual diagnostics, economic plausibility)
  • Tasks 7, 9: Estimate stability (MAP robustness across N/seeds/starts, prior sensitivity)
  • Task 12: Integration (self-tests pass, regression tests 37/37)

Results

  • RBPF is 100x-300,000x more efficient than bootstrap PF (bootstrap ESS collapses to 2-3 regardless of N)
  • All 4 self-test blocks pass (residual diagnostics, ESS analysis, two-regime round-trip, posterior diagnostics)
  • Phase E.1 regression tests: 37/37 pass
  • 17 economic plausibility checks across 5 categories implemented
  • Infrastructure ready; heavy-compute experiments (Tasks 4, 5, 7, 9) await MAP completion
  • No separate audit session

Analysis

This is an infrastructure experiment, not a numerical result. The key finding is that the RBPF’s advantage over the bootstrap PF is structural: particles only need to track regimes (a discrete 4-state variable) rather than the full 6D continuous state, so Kalman filtering handles the high-precision macro observations exactly. The robustness suite is designed to answer the five reviewer questions: why this filter, are estimates stable, does the model fit, how prior-sensitive, and is it economically sensible.

Claim updates

No specific claim updated (infrastructure experiment).

Follow-up

  • Run full-mode experiments once MAP is finalized (FULL_RUN=true, ~12+ hours)
  • 2-regime restricted model LRT/BIC comparison is the key model-selection deliverable
  • Prior sensitivity analysis will identify which parameters are prior-driven vs data-driven