Exact Switching-Kalman-Filter inference is intractable: the belief state is a mixture of M^t Gaussians

Statement

The exact filtering posterior $p (x_{t} ∣ y_{1 : t})$ in a Switching Kalman Filter (SKF) with $M$ discrete regimes and Markov regime transitions is a mixture of $M^{t}$ Gaussians, one component per possible regime history $S_{1 : t}$ . Equivalently, the bookkeeping cost of exact SKF inference grows as $Ω (M^{T} n^{2})$ in the time horizon $T$ and continuous-state dimension $n$ . This makes exact inference intractable for any non-trivial $M$ and $T$ , and is the central reason all practical SKF inference algorithms (GPB1, GPB2, IMM, MHT, RBPF, variational SKF) are approximations.

Evidence summary

Strong evidence (Murphy 1998). Section 2 of Murphy’s tech report constructs the $M^{t}$ growth by induction: from a mixture of $M^{t - 1}$ Gaussians at $t - 1$ , each component must be propagated through $M$ different Kalman dynamics (one per possible value of $S_{t}$ ), yielding $M^{t}$ components at $t$ . The construction is exact and tight. Murphy then enumerates the four families of approximation that defeat this growth (collapsing, selection, sampling, variational) and notes that the existence of any practical SKF inference algorithm is contingent on this approximation.

The result is folklore in target tracking (Bar-Shalom-Li 1993, Blom-Bar-Shalom 1988) and switching state-space modelling (Kim 1994, Ghahramani-Hinton 1996b), but Murphy’s tech report is the cleanest unified statement.

Conditions and scope

Requires $M \geq 2$ discrete regimes (otherwise the model is a plain LDS)
Requires Markov dynamics on the regime (semi-Markov / explicit-duration models change the bookkeeping but not the qualitative growth)
The continuous state must be entangled across history — i.e. $x_{t}$ must enter the next-step distribution. (For Switching AR models with $C = I$ , $x_{t}$ is observed and the only hidden variable is discrete, so exact inference is tractable in $O (M^{2} T)$ time — this is the SAR exception Murphy notes in §2.1.)
The result is statement-of-fact rather than empirical: it follows directly from the model definition and does not depend on any data.

Counter-evidence

None. The result is constructive and tight; there is no published counter-example. Special-case tractability (Switching AR, fully-observed switching) does not contradict the general claim — it falls outside the “continuous latent state” precondition.

Linked ideas

(none yet — populated by /ideate runs that target this claim)

Open questions

Are there structured SKFs (e.g. with sparse $A_{j}$ , low-rank parameter tying, or factored regime chains) where the effective component count is provably much smaller than $M^{t}$ ?
Can factor-graph contraction or tensor-network methods give exact inference in $O (poly (M, T))$ for restricted SKF subclasses?

LeatherSagiKnowledgebase

Explorer