Exponential Belief State Growth (M^t) in Switching State-Space Models

Definition

In a switching-kalman-filter with $M$ discrete regimes and Markov regime dynamics, the exact filtering posterior $p (x_{t} ∣ y_{1 : t})$ is a mixture of $M^{t}$ Gaussians — one component per possible regime history $S_{1 : t} \in {1, \dots, M}^{t}$ . This is exponential belief-state growth: the representation cost of the exact posterior doubles (and more) at every time step, making exact inference intractable beyond a handful of steps even though the model is conditionally linear-Gaussian.

Intuition

Suppose at time $t = 1$ the prior $p (x_{1})$ is a mixture of $M$ Gaussians, one per value of $S_{1}$ . To advance to $t = 2$ , each of those $M$ Gaussians must be propagated through each of the $M$ regime- $j$ Kalman dynamics (because the regime can switch), producing $M^{2}$ Gaussians. Each of those must in turn be advanced through $M$ dynamics to time 3, giving $M^{3}$ . By time $t$ , the posterior has $M^{t}$ components. The $M^{t}$ components are not redundant in general: they correspond to genuinely different regime histories with different posterior means and covariances.

This is the central computational obstruction in switching state-space models and is the reason every practical SKF inference algorithm — GPB1, GPB2, IMM, MHT, RBPF, variational SKF — is an approximation that controls $M^{t}$ growth either by collapsing (gpb-imm-collapsing-filters), by selecting a subset of paths (Multiple-Hypothesis Tracking, RBPF), by sampling ( $S_{1 : T}$ via MCMC), or by breaking the vertical $S_{t} \to X_{t}$ links variationally.

Formal notation

Let $S_{t} \in {1, \dots, M}$ be the regime, $x_{t} \in R^{n}$ the continuous state, and $y_{1 : t}$ the observations. The exact filtering posterior factors as

p (x_{t} ∣ y_{1 : t}) = S_{1 : t} \in {1, \dots, M}^{t} \sum Pr (S_{1 : t} ∣ y_{1 : t}) p (x_{t} ∣ y_{1 : t}, S_{1 : t})

where each $p (x_{t} ∣ y_{1 : t}, S_{1 : t})$ is a Gaussian (the result of running the Kalman filter conditional on the regime path $S_{1 : t}$ ). The number of components is $∣ {1, \dots, M}^{t} ∣ = M^{t}$ , and the per-component bookkeeping (Kalman mean + covariance) is $O (n^{2})$ , so the total exact-inference cost at horizon $T$ is $Ω (M^{⊤} n^{2})$ .

Variants

History buffer of length $r$ — keep only the last $r$ regime values, so the mixture has $M^{r}$ components (this is GPB( $r$ ); see gpb-imm-collapsing-filters)
Path selection — keep only the highest-probability $K$ regime paths (Multiple-Hypothesis Tracking, particle filters)
Compound regime explosion — if the regime has factored structure $S_{t} = (S_{t}^{(1)}, S_{t}^{(2)}, \dots, S_{t}^{(d)})$ with $M_{k}$ states each and the chains are independent, the effective regime count is $M = \prod_{k} M_{k}$ and the explosion is $(\prod_{k} M_{k})^{t}$ . This is the relevant scaling for the CRE asset-pricing model in this workspace, which uses two independent binary chains (monetary policy × wage rigidity) for $M = 4$ compound regimes.

Comparison

vs HMM forward algorithm cost — an HMM with $M$ states has $O (M^{2} T)$ exact inference cost because the latent variable is discrete and can be marginalised at every step. The SKF cannot do this marginalisation in closed form because the continuous state $x_{t}$ entangles histories.
vs Linear Dynamical System (LDS) — an LDS has $O (n^{3} T)$ exact Kalman cost. The SKF inherits this per regime path, so the SKF cost is $O (M^{⊤} n^{3})$ — exponentially worse in $T$ .
vs Dynamic Bayesian Network (DBN) — a DBN with all-discrete state of cardinality $D$ has $O (D^{2} T)$ junction-tree inference cost. The SKF avoids the discretisation cost of large $D$ but pays the $M^{t}$ history cost instead.

When to use

This is a negative result — it tells you when not to attempt exact inference. Specifically:

If $M \geq 3$ and $T \geq 20$ , exact SKF inference is already infeasible (mixture has $\geq 3 \times 1 0^{9}$ components)
For real macro / financial time series ( $T$ in the hundreds), exact inference is utterly impossible at any $M \geq 2$
For the CRE asset-pricing model ( $M = 4$ compound regimes, $T = 119$ quarterly observations), the exact mixture would have $4^{119} \approx 4.4 \times 1 0^{71}$ components

The implication is that every production SKF must use one of the four approximation families catalogued in murphy-1998-switching-kalman-filters.

Known limitations

The $M^{t}$ bound is tight in the worst case: you cannot avoid it by exploiting symmetry unless the model has explicit parameter tying that reduces the effective regime count.
The exponential growth applies even when most regime histories have vanishing posterior mass — the difficulty is the bookkeeping, not the numerical mass.
“Spreading-out” arguments (Boyen-Koller [BK98a, BK98b]) bound the error of the collapsing approximation but do not reduce the exact-inference cost.

Open problems

Sharper conditions under which a low-dimensional $r$ -step history buffer suffices (i.e. when do far-past regime histories truly stop mattering?)
Adaptive history-buffer sizing based on regime entropy / mixing time
Exact inference via factor-graph contraction tricks for SKFs with sparse $A_{j}, C_{j}$ matrices

Key papers

murphy-1998-switching-kalman-filters — the cleanest modern statement of the $M^{t}$ explosion and the four approximation families that defeat it

My understanding

The $M^{t}$ explosion is the central reason every modern switching-state-space algorithm — GPB, IMM, MHT, particle filters, RBPF, variational SKF — exists at all. For this workspace’s CRE asset-pricing model, the relevant cost is $4^{119}$ exact mixture components, which is the entire reason the model uses a Rao-Blackwellised particle filter that selects a small bag of regime histories rather than enumerating them. Knowing that exact inference is provably infeasible (rather than merely “hard”) is also the right argument to give when defending an approximate-inference design choice in a paper or rebuttal.

LeatherSagiKnowledgebase

Explorer