Problem
Gaussian mixture models grow combinatorially when used in recursive Bayesian filters: each measurement update / mode-history branch can multiply the number of mixture components, so practical filters (multi-hypothesis trackers, switching state-space estimators, jump-Markov nonlinear filters) must periodically reduce an N-component mixture to a smaller M-component mixture without distorting the density. The classical reduction algorithms of Salmond (joining) and Williams (integral square error) pick which components to merge using ad hoc dissimilarity measures that are known to produce anomalies — for example, merging components that are far apart in mean but happen to have similar covariances, or refusing to merge near-duplicate components when their weights are very unequal.
Key idea
Use the Kullback-Leibler (KL) divergence between the original mixture and the reduced mixture as the principled cost of a reduction step. The exact KL between two mixtures has no closed form, but Runnalls derives a tight, closed-form upper bound on the KL cost incurred by merging two specific components into their moment-matched single Gaussian. This upper bound depends only on the two components’ weights, means, and covariances and can be evaluated in O(d^3) per candidate pair. Greedy pairwise merging that always picks the pair minimizing this upper bound gives a gaussian-mixture-reduction algorithm that (i) is anomaly-free on the standard Salmond / Williams pathological examples, (ii) has the same asymptotic cost as the older heuristics, and (iii) is grounded in an information-theoretic objective rather than an arbitrary geometric one.
Method
- Moment-matched merge. For two components with weights w_i, w_j, means μ_i, μ_j, covariances P_i, P_j, the moment-matched single Gaussian has weight w_ij = w_i + w_j, mean μ_ij = (w_i μ_i + w_j μ_j)/w_ij, and covariance P_ij = (w_i (P_i + (μ_i − μ_ij)(μ_i − μ_ij)^T) + w_j (P_j + (μ_j − μ_ij)(μ_j − μ_ij)^T))/w_ij. This preserves the first two moments of the original sub-mixture exactly.
- Upper bound on merge KL cost. Runnalls shows the KL divergence from the original two-component sub-mixture to the merged single Gaussian is bounded above by B(i,j) = ½ [(w_i + w_j) log det P_ij − w_i log det P_i − w_j log det P_j]. This is the dissimilarity used to rank candidate merges. It is non-negative, zero iff the two components are identical, and well-defined whenever the covariances are positive definite.
- Greedy pairwise reduction loop. Starting from the N-component mixture, repeatedly find the pair (i, j) minimizing B(i,j), merge it via the moment-matched formula, and repeat until M components remain. Cost is O(N^2) pair evaluations per round, identical in scaling to Salmond / Williams.
- Anomaly-free behaviour. Runnalls walks through the standard pathological examples used to criticize Salmond and Williams (well-separated components with similar covariances; near-duplicates with very unequal weights) and shows that the KL upper bound ranks the “right” pair as the cheapest merge in each case.
Results
- On the canonical Salmond and Williams test mixtures, the KL upper bound algorithm produces qualitatively correct reductions where the older criteria produce visibly wrong reductions (merging across a true mode gap, or refusing to collapse a near-duplicate).
- On synthetic high-dimensional mixtures the reduction quality (measured by the true KL between original and reduced mixture, computed by Monte Carlo) is consistently better than Salmond / Williams at matched component budgets.
- Computational cost is the same order as the older heuristics, so the gain is “free” in wall time.
- The bound is reported as tight enough that the resulting greedy choice essentially never differs from greedy choice on the (intractable) exact KL.
Limitations
- The upper bound becomes loose when the two candidate components differ drastically in covariance shape; in that regime the greedy merge can in principle be sub-optimal relative to true KL.
- The method is purely greedy / pairwise. It does not consider higher-order re-balancings (splitting, three-way merges, EM-style refits) that would give a globally KL-optimal M-component approximation.
- Requires positive-definite covariances throughout; degenerate components (rank-deficient P) need a regularization preprocessing step.
- The criterion is symmetric in the two component labels but the resulting reduction is not invariant to the order of merges when the bound has ties.
Open questions
- How does the Runnalls KL-bound greedy reduction compare to modern variational or Wasserstein-based mixture reductions in high-d?
- Is there a principled stopping rule (data-driven choice of M) using the bound itself as an information-budget gauge?
- Can the bound be tightened to second order without losing the closed form?
My take
This paper is the canonical reference for “use KL, but make it tractable” in Gaussian mixture reduction. The key insight — that you don’t need the exact KL, you just need a closed-form upper bound that ranks merge candidates the same way the exact KL would on the cases that matter — is exactly the kind of engineering compromise that lets a Bayesian-purist objective survive contact with a recursive filter loop. It is the natural background reference for any Rao-Blackwellized particle filter or Markov-switching estimator that has to collapse a regime-history mixture between time steps, and it pairs naturally with crouse-look-closer-mixture-reduction (which surveys later refinements on the same theme).