Switching Kalman Filter

Definition

A Switching Kalman Filter (SKF), also called a Switching State-Space Model, Switching Linear Dynamical System, Jump Markov Linear System, or Conditional Gaussian State-Space Model, is a state-space model in which a discrete latent mode variable $S_{t} \in {1, \dots, M}$ — itself a Markov chain with transition matrix $Z$ — selects which set of linear-Gaussian dynamics and/or observation matrices is active at time $t$ :

x_{t} y_{t} = A_{S_{t}} x_{t - 1} + v_{t}, = C_{S_{t}} x_{t} + w_{t}, v_{t} w_{t} \sim N (0, Q_{S_{t}}) \sim N (0, R_{S_{t}})

Conditional on a regime path $S_{1 : t}$ , the model is exactly a (time-varying) linear-Gaussian state-space model, so the continuous state is conditionally Kalman-filterable. The full filtering posterior $p (x_{t} ∣ y_{1 : t})$ is a mixture of Gaussians, with one component per regime history.

Intuition

An SKF is what you build when (a) you know your dynamics or observation noise are piecewise linear with a discrete mode (e.g. tracking a manoeuvring aircraft, where horizontal motion and vertical motion are different sub-models), or (b) you want to model non-Gaussian noise as a mixture of Gaussians (e.g. robust regression, sensor failure detection), or (c) you want to model regime-dependent macroeconomic dynamics where monetary-policy or volatility states alter the linear law of motion. The SKF gives you the expressive power of a hidden Markov model layered on top of a Kalman filter without abandoning Gaussian closed-form updates conditional on a regime path.

Formal notation

$S_{t} \in {1, \dots, M}$ : discrete mode at time $t$
$Z (i, j) = Pr (S_{t} = j ∣ S_{t - 1} = i)$ : transition matrix
$π_{j} = Pr (S_{1} = j)$ : initial regime distribution
$A_{j}, Q_{j}, C_{j}, R_{j}$ : regime- $j$ dynamics, process noise, observation, and observation-noise matrices
$x_{t} \in R^{n}$ : continuous latent state
$y_{t} \in R^{m}$ : observation
Filtering posterior: $p (x_{t}, S_{t} ∣ y_{1 : t})$ — exactly a mixture of $M^{t}$ Gaussians (see exponential-belief-state-growth)
$W_{t}^{j} = Pr (S_{t} = j ∣ y_{1 : T})$ : smoothed regime weight (E-step output)

Variants

Switching dynamics only — $S_{t}$ selects $(A_{j}, Q_{j})$ , $C, R$ are fixed
Switching observations only — $S_{t}$ selects $(C_{j}, R_{j})$ ; useful for modelling outliers and sensor failure as a mixture
Both switching — fully general SKF; the dynamics and observation switches may share or have separate Markov chains
Switching AR (SAR) model — $C = I$ , so $X$ is observed directly; the only hidden variable is discrete $S_{t}$ , and exact inference is tractable (no approximation needed)
Data-association SKF ([Ghahramani-Hinton 1996b]) — the switch selects which sub-process is read out into $y_{t}$ ; models data-association ambiguity
Multiple-Hypothesis Tracking ([Bar-Shalom-Fortmann 1988]) — selection approximation that keeps only the highest-probability regime histories
Compound regime SKF — two or more independent Markov chains over $S_{t}$ ; the effective regime space is the Cartesian product of their states (e.g. monetary-policy chain × wage-rigidity chain in the CRE asset pricing model)

Comparison

vs Kalman filter — adds a discrete regime; conditional on the regime path the Kalman update is unchanged. Marginal posterior is multimodal in $x_{t}$ even when each component is unimodal.
vs HMM — adds a continuous state; emission “matrix” is now a continuous Kalman update rather than a finite categorical likelihood.
vs Dynamic Bayesian Network with all-discrete state — the SKF retains closed-form Gaussian updates per regime, avoiding the discretisation cost of large continuous state spaces.
vs Particle filter for the same model — particle filters are selection approximations in the SKF taxonomy; collapsing approximations (gpb-imm-collapsing-filters) are the deterministic alternative.

When to use

Piecewise-linear dynamics with a known discrete mode structure (manoeuvring target tracking, regime-switching macro models)
Non-Gaussian noise modelled as a mixture of Gaussians (robust regression, outlier-tolerant filtering, fault detection)
Macroeconomic and financial time-series models with monetary-policy or volatility regimes (e.g. the CRE asset pricing model in this workspace, Hamilton 1989, Kim 1994, Bansal-Zhou 2002)
Any conditional-Gaussian state-space model where you want a Rao-Blackwellised representation: marginalise the continuous state analytically per regime path, then approximate or sample only over regime histories

Known limitations

Exact inference is intractable. The exact filtering posterior is a mixture of $M^{t}$ Gaussians (exponential-belief-state-growth); every practical SKF algorithm approximates this growth.
Local minima in EM. EM-based SKF learning has $M^{T}$ candidate segmentations and is notorious for converging to poor local optima (deterministic annealing helps but does not solve the problem).
Mode-mismatch sensitivity. SKFs assume the number of modes $M$ and the parameter tying scheme are known; misspecification can cause silent drift.

Open problems

Optimal trade-off between collapsing order $r$ and computational cost (gpb-imm-collapsing-filters)
Tight error bounds on collapsing approximations beyond Boyen-Koller
Online learning of SKF parameters with regime change-point detection

Key papers

murphy-1998-switching-kalman-filters — unified treatment of SKF inference (GPB1, GPB2, IMM, variational, MHT) and EM-based learning

My understanding

The SKF is the right modelling choice whenever you have (a) a continuous state with linear-Gaussian dynamics conditional on a discrete mode and (b) a small enough mode space ( $M$ in the single digits to low tens) that mixing or collapsing over regime histories is tractable. The Rao-Blackwellised particle filter pattern — particles carry only the regime history, and the continuous state is exact-Kalman conditional on each particle — is the selection-class SKF approximation in Murphy’s taxonomy and is the inference engine used by the CRE asset-pricing model in this workspace (SimMdlPrices/src/rbpf.jl).

LeatherSagiKnowledgebase

Explorer