Moment-matching collapse in switching filters has bounded approximation error over time

Statement

When a Switching Kalman Filter is approximated by replacing the exact $M^{t}$ -component mixture posterior with a fixed-size mixture via moment-matching collapse (the GPB / IMM family), the error introduced at each collapse step does not accumulate without bound. The stochasticity of the regime Markov chain and the continuous-state process noise cause the true and approximate posteriors to overlap and contract, so the long-run filtering error stays bounded — provided the regime chain is ergodic and the process noise is non-degenerate.

Evidence summary

Moderate evidence (Murphy 1998 + Boyen-Koller). Murphy 1998 §2 invokes this as the central justification for the collapsing-class approximation family but does not re-prove it. The original argument is in Boyen-Koller [BK98a] (Approximate learning of dynamic models, NIPS 1998) and [BK98b] (Tractable inference for complex stochastic processes, UAI 1998), which prove a contraction result for projected dynamic Bayesian networks: at each step the true posterior “forgets” old information at a rate determined by the mixing time of the underlying chain, and the projection error is absorbed by this forgetting.

The Boyen-Koller results are proved for discrete-state DBNs; the extension to continuous-state SKFs is asserted by Murphy but not formally re-derived. Empirically, GPB2 and IMM have been used successfully in target tracking and econometrics for decades without observed error accumulation, which is the strongest practical support for the claim.

The claim is rated weakly_supported (rather than supported) because: (a) the formal Boyen-Koller results are for discrete DBNs and the SKF extension is informal; (b) the bound is qualitative — it does not give a useful $ε (t)$ for choosing the collapsing order $r$ ; and (c) the preconditions (ergodic mixing, non-degenerate process noise) can fail silently in regime-switching macro models with very persistent regimes or near-deterministic dynamics.

Conditions and scope

The regime Markov chain must be ergodic and have a non-trivial mixing time
The process noise covariance $Q_{j}$ must be non-degenerate for at least some regimes
Moment-matching collapse must be applied at every step (not delayed or batched)
The number of regimes $M$ must be small enough that GPB( $r$ ) for $r \geq 2$ is computationally feasible
Out of scope: the claim says nothing about the constant in the bound, which can be large. It also says nothing about EM-based learning, where the cross-variance approximation enters separately.

Counter-evidence

Persistent-regime macro models. In SKFs with very persistent regimes (e.g. monetary-policy chains with quarterly switching probabilities of $\sim 0.05$ ), the mixing time can be 20+ steps and the qualitative contraction argument provides no practical bound at horizons of interest.
Near-deterministic dynamics. When $Q_{j} \to 0$ , the true posterior does not spread out, and the collapsing approximation can drift.
Multimodal regime posteriors. When the regime posterior is genuinely multimodal (e.g. ambiguous data association), moment-matching collapses the modes into a single mean+covariance and silently destroys information.

Linked ideas

(none yet)

Open questions

Can the Boyen-Koller contraction bound be sharpened for continuous-state SKFs to yield a useful per-step $ε$ ?
For which SKF subclasses (e.g. linear macro models with persistent regimes) is the contraction argument quantitatively meaningful, and for which is it merely qualitative?
Does the bound interact with the cross-variance approximation used in GPB2-EM in a way that affects parameter recovery?

LeatherSagiKnowledgebase

Explorer