Kalman Filtering A Far More General, Optimal Estimator for Multivariable Dynamic Systems
This page briefly describes Kalman filtering, which is a far more general solution for optimal estimation in multivariable, dynamic systems than the simple filters discussed so far. This page is part of the section on Filtering that is part of A Guide to Fault Detection and Diagnosis.
Overview
The filters described so far have been simple filters, producing a single output based on a single input. Kalman filtering provides a much more general framework and solution for finding the optimal estimate for multiple state variables in dynamic systems, given any number of measurements. The starting point is an initial estimate of state values and their uncertainty, a “state space” dynamic model for predictions (with “process noise” errors specified to account for uncertainty in the model), and a measurement model mapping states to measurements, including specification of measurement noise. The initial estimate, process noise, and measurement noise are all specified as multivariable normal distributions with a mean vector and a covariance matrix. (A covariance matrix is the multivariable generalization of the variance that specifies uncertainty for single variables).
The dynamic model can be represented either in continuous time (differential equations) or discrete time (difference equations). Although the basic model form is linear, there are numerous extensions for nonlinearities.The equations describing a Kalman filter can be derived as an application of least squares without really thinking in terms of probability distributions, or through maximum likelihood. The (discrete) least squares approach minimizes a weighted combination of measurement adjustments and model prediction adjustments at each time step to achieve an optimal balance between the two. The weighting is based on the inverses of the covariance matrices for the measurement noise and process noise. The uncertainty in the initial conditions is used in the first prediction, so it’s effects decay away over time for a stable system.
Estimation depends on the explicit assumptions on the noise  additive, and characterized by covariance matrices. These matrices can be estimated from historical data, or simply set as tuning parameters using engineering judgment and rules of thumb. For instance, for measurements, we normally assume independent measurements (diagonal covariance matrix) and “typical” variances (starting with typical standard deviations as a percent of the instrument range) for different types of sensors, possibly adjusting for the type of service they are in.
Operation of the discrete time Kalman filter
Operation of the discrete time version of the Kalman filter is perhaps the easiest to understand. You start off with initial estimate of the state variables and their uncertainty (covariance matrix) . You make a prediction of the mean value of the state for the next time step using the dynamic model. You also calculate a new value for the covariance matrix (uncertainty). The uncertainty for the predicted value will be greater than the initial estimate because of the specified model uncertainty. You then apply a correction term based on new measurements available at that time step. That improves the state estimate, and reduces the uncertainty of that estimate. The cycle repeats for each time step.
The Kalman filter balances the model uncertainty with measurement uncertainty to come up with an “optimum” estimate. If the measurements are specified as extremely accurate (low measurement variances) compared to model errors, then the model predictions will be almost ignored. If the model is considered accurate compared to the measurements (high measurement variances), the model prediction will get more weighting. In the extreme case of almost worthless measurements (very high measurement variances), you are basically using just running a dynamic simulation, starting with the initial conditions.
Many linear filters can be derived as special cases of the discrete time Kalman filter
Many simple linear filters can be shown to be special solutions of the Kalman filter under various assumptions for the process model, measurement model, and error models. Even when a filter only has a single input and output, it might have many internal states in its representation in “state space” form. For the moving average type of filters, the internal states correspond to each delayed value used in computing the result. For autoregressive filters, the “order” of the filter corresponds to the number of internal states.
In deriving the exponential filter as a special case of a Kalman filter, the process model is “brownian motion”. That is, the predicted value at the next time step is the same as the previous one, where the process noise is specified to have a variance q. The measurement is made directly of the state variable, with a variance r. The ratio q/r determines the filter parameter, to weight the old estimate vs. the new measurement.
The exponential filter as a special case of a Kalman filter is most obvious when the exponential filter equations are written in a predictorcorrector form. The previous value serves as the prediction, and a correction term is based on the difference between the new input and the prediction.
For the simple moving average filter, the process model prediction is also the previous value, and the measurement also directly measures the state with the measurement uncertainty. But unlike the brownian motion model that leads to the exponential filter, the process noise variance is taken as zero. You are really just estimating the initial value. The initial value variance is taken as infinity, so that it has no influence on the estimate. The derivation of the filter stops after the n steps. The “moving” average at each time step is just an application of the same formula each time, to the set of previous inputs.
A least squares filter can also be developed via the Kalman filter, estimating both a value and slope as the two state variables. The general approach of simultaneously estimating both the value and the slope is also done the “alphabeta” filter, and, equivalently, in “double exponential smoothing”. Although these were derived independently in two different fields (aerospace and economic forecasting), both have been shown to be limiting cases of a Kalman filter after it has been running for a while.
The static case of data reconciliation can be derived as a special case of Kalman filtering. It uses only algebraic equations, and asserting that the process noise is zero. (More general steady state estimation could also allow process noise, because of model uncertainty introduced in physical properties and other calculations).
Using Kalman filters for diagnosis
All ffilters can play a useful role in diagnosis, by reducing noise so that diagnostic conclusions don’t “chatter” from one time to the next. But because of the predictions inherent in the tool, Kalman filters can do more. By itself, the Kalman filter doesn't handle diagnosis. If there's a fault, it breaks something in the physical model or the measurement model or one of the corresponding covariance matrices, so that those assumptions under which the estimate is "optimal" no longer apply, and the Kalman filter will come up with wrong estimates. But you turn that around by including the Kalman filter in the overall diagnostic system. When the predictions of new measurement values start to deviate significantly from the observed values (the "innovation" gets bigger), that's a strong indication of a fault. So you get very sensitive fault detection. You still have additional work to do to determine the root cause, however. Then you need to look at the patterns in the deviations.
Copyright 2010  2020, Greg Stanley
Return to Filtering Next: Other Filters
Return to A Guide to Fault Detection and Diagnosis
