Simple and extended Kalman filters: an application to term structures of commodity prices

This article presents and compares two different Kalman filters. These methods provide a very interesting way to cope with the presence of non-observable variables, which is a frequent problem in finance. They are also very fast even in the presence of a large information volume. The first filter presented, which corresponds to the simplest version of a Kalman filter, can be used solely in the case of linear models. The second filter – the extended one – is a generalization of the first one, and it enables one to deal with non-linear models. However, it also introduces an approximation in the analysis, whose possible influence must be appreciated. The principles of the method and its advantages are first presented. It is then explained why it is interesting in the case of term structure models of commodity prices. Choosing a well-known term structure model, practical implementation problems are discussed and tested. Finally, in order to appreciate the impact of the approximation introduced for non-linear models, the two filters are compared.

The main principle of the Kalman filters is to use temporal series of observable variables in order to reconstitute the values of non-observable variables. In finance, the problem of non-observable variables arises for example with term structure models of interest rates, term structure models of commodity prices and with market portfolios in the capital asset pricing model. When associated with an optimization procedure, the Kalman filter provides a way to estimate the model parameters. Finally and most importantly, because it is very fast, the method is also interesting for large data sets.
There are different versions of Kalman filters. 2 The simple one is also the most famous and it is quite frequently used in finance nowadays. 3 Nevertheless, it is not suitable for nonlinear models. In that case, an extended filter can be used. However, the latter relies on an approximation, whose possible influence on the model performances needs to be assessed. Apart from this distinction, the two filters rely on the same principles.
The Kalman filter is an iterative process. The model has to be expressed in a state-space form characterized by a transition equation and a measurement equation. 4 This transition equation describes the dynamics of the state variables , for which there are no empirical data. During the first step of the iteration -the prediction phase -this equation is used to compute the values of the non-observable variables at time t, conditionally on the information available at time (t À 1). The predicted values t=tÀ1 are then substituted into the measurement equation to determine the value of the measuresỹ y t . The measurement equation represents the relationship linking the observable variablesỹ y with the non-observable . In the second iteration step -or innovation phase -the innovation v t , which is the difference, at t, between the measureỹ y t and the empirical data y t is calculated. The innovation is used, in the third iteration step -or updating phase -to obtain the value of t conditionally on the information available at t. Once this calculation has been made, t is used to begin a new iteration. Thus, the Kalman filter makes it possible to evaluate the non-observable variables , and it updates their value in each step using the new information.
This brief presentation explains why the Kalman filter is a very fast method. Indeed, to reconstitute the temporal series of the non-observable variables, only two elements are necessary: the transition equation and the innovation v. Because there is an updating phase in the iteration, very little information is needed.
The remainder of the paper is organized as follows. Section II presents the term structure models of commodity prices and explains why their use necessitates resorting to the Kalman filters. Section III explains how to apply the simple and the extended Kalman filters to a well-known model developed by Schwartz in 1997. Relying on the model performances, section IV compares the two filters and discusses some practical implementation problems. Section V concludes.

II. THE TERM STRUCTURE MODELS OF COMMODITY PRICES
This section, after describing some general features characterizing the term structure models of commodity prices, presents the model used for the comparison between the simple and the extended Kalman filters: Schwartz's model.

General presentation
The term structure of commodity futures prices describes the relationships between the spot price and futures prices for different delivery dates. So it synthesizes all the information available in the market. Several term structure models have been proposed in the literature. Their objective is firstly to reproduce the observed futures prices as accurately as possible, and second to extend the curve for very long maturities, even for delivery dates which are not available in the market.
Term structure models borrow from the contingent claim analysis developed in a partial equilibrium framework for options and interest rates models. Relying on arbitrage reasoning, the development of a term structure model of commodity prices follows three successive steps: identification of the state variables, specification of their dynamics and extraction of the futures prices values from a differential valuation equation.
When only one state variable is used to explain the futures prices behaviour, as is the case, for example, in Brennan and Schwartz's model (1985), this single factor is the spot price. Recognizing the limits of such a formulation, several models based upon two state variables have been proposed (Schwartz, 1997;Hilliard and Reis, 1998;Lautier, 2000). In that case, the second factor is the convenience yield, which can be briefly defined as the comfort associated with the possession of physical stocks (Brennan, 1958). The introduction of a second state variable allows for richer shapes of curves and volatility structures. This improvement is however costly because the models are naturally more complicated. The difficulty arises from the increasing number of parameters and from the nonobservable nature of the state variables. In fact, there are usually no empirical data for these two variables because there is generally a lack of reliable time series for the spot price, 5 and convenience yield is not a traded asset. Therefore, there is a need for a method like the Kalman filter. Schwartz's model (1997) is a well-known term structure model of commodity prices. Three reasons lead one to choose it. First, it performs well. Second, it has an analytical solution, which simplifies the application of the Kalman filters. Third, it allows for the use of a simple Kalman filter, provided some precautions are taken. Schwartz's model supposes that the spot price S and the convenience yield C can explain the behaviour of the futures prices F. The dynamics of these state variables is:

Schwartz's model
where -is the drift of the spot price, -S is the volatility of the spot price, dz S is an increment to a standard Brownian motion associated with S, -is the long run mean of the convenience yield, -is the speed of adjustment of the convenience yield, -C is the convenience yield volatility, dz C is an increment to a standard Brownian motion associated with C.
As the storage theory showed, the two state variables are correlated because both the spot price and the convenience yield are an inverse function of the inventory levels. Nevertheless, as Gibson and Schwartz (1990) demonstrated, the correlation between these two variables is not perfect: where is the correlation between the two Brownian motions associated with S and C. The convenience yield is mean reverting and is involved in the spot price dynamics. Mean reversion relies on the hypothesis that there is a level of stocks which satisfies the needs of industry under normal conditions. The behaviour of the operators in the physical market guarantees the existence of this normal level of stocks. When the convenience yield is low, the stocks are abundant and the operators sustain a high storage cost compared with the benefits related to holding the raw materials. So, if they are rational, they try to reduce these surplus stocks. Conversely, when the stocks are rare, the operators tend to reconstitute them.
The solution of the term structure model can be expressed in a risk neutral framework, using a Feynman-Kac solution. Therefore, the value of the futures prices can be written as: where F(t, T) is the futures price at t for delivery at T, and Q l denotes the risk neutral probability, 6 which is dependent of an unknown value l. The latter is the market price of convenience yield risk. The solution is: FðS, C, t, TÞ ¼ SðtÞ Â exp ÀCðtÞ 1 À e À þ BðÞ where r is the risk free interest rate, assumed constant, -¼ T À t is the maturity of the futures contract.
To assess the model's performances, the optimal values of the parameters are needed first. Then, they can then be used to compute the estimated futures prices and to compare them with empirical data.

III. APPLYING THE KALMAN FILTERS
In this section, the way to transform Schwartz's model into a state-space model is explained, for the simple and for the extended filters. Then, implementation problems are discussed.

Simple filter
The simple filter is suited for linear models. To apply it, the solution of Schwartz's model must be expressed on a linear form: ln FðS, C, t, TÞ ð Þ¼ln SðtÞ ð ÞÀCðtÞ Â 1 À e À þ BðÞ Letting G ¼ ln (S), one also has: 7 The state-space form of the model is the following. The transition equation is the expression, in discrete time, of the state variables dynamics. Using the same notation as before, this equation is: . N is the number of maturities used for the estimation, In the case of term structure models of commodity prices, certain conditions must be respected in order to obtain a unique risk-neutral probability. For more details on that remark, see for example Lautier (2000). 7 This article uses the historical probability for the state variables dynamics. However, the futures price being expressed in a risk-neutral framework, it is possible to use this probability for the state variables. This method reduces the number of parameters: the drift and the risk premium l disappear. It also induces a loss of information, because one must directly estimate the parameter _ .
. Át is the period separating two observation dates, . t are errors that are uncorrelated with the previous values of the state variables, and have no serial correlation: The measurement equation comes directly from the model pricing formula, which must also be discretized: where: . the ith line of the N dimensional vector of the observable variablesỹ y t=tÀ1 is lnðF . . , N, and where: " t is a white noise vector, (N Â 1), with no serial correlation: In continuous time, the pricing equation of a term structure model does not involve any error term ". The use of a Kalman filter leads to the introduction of this term, which is difficult to estimate. This term can be interpreted as follows. First, it stands for market imperfections and arbitrage opportunities. Second, as the Kalman filter is a kind of inverse process, which is often unstable, it can be considered as a regularization term. Its addition leads to a distribution forỹ y, which is the initial one, convoluted with a Gaussian kernel.

Extended filter
In an extended filter, the previous system matrices Z, T and R are replaced with non-linear functions depending on the state variables. So there is no need to linearize Schwartz's model. The transition equation becomes: The measurement equation becomes: In the extended filter, as the transition and measurement equations are non-linear, there is no analytical formula for the conditional expectations. Therefore, the latter must be approximated. This approximation does not appear in the simple filter.

Implementation problems
Some difficulties must be overcome when using Kalman filters. First, some choices must be made to start the iterative process. Second, if the model has been expressed as the logarithm for the simple Kalman filter, some precautions must be taken. Third, the covariance matrix H influences the performances.
Starting the iterative process. To start the iterative process, initial values of the non-observable variables and of their covariance matrix are needed.
For the term structure models of commodity prices, the non-observable state variables are usually the spot price and the convenience yield. The nearest futures price is generally used as the spot price S, and the convenience yield C can be computed from the solution of Brennan and Schwartz's model (1985). This solution requires the use of two observed futures prices, for delivery at T 1 and at T 2 : where T 1 is the nearest delivery, and T 2 is the next one.
The covariance matrix associated with the state variables must also be initialized. A diagonal matrix is chosen with the spot price and the convenience yield variances on the diagonal. These variances were computed from the 30 first dates in the estimation period.
Analysing the results of the simple filter. When the model is expressed in its logarithmic form in the case of the simple filter, some precautions must be taken to measure the model's performances, because the innovations are computed with logarithms. A difficulty arises when the estimated and empirical data are rebuilt. The relationship linking the estimations logarithmỹ y t=tÀ1 with the observations logarithm y t is the following: where is the standard error of the innovations and R is a gaussian residue. To be more precise, when the estimated logarithm is used to obtain the estimates themselves, the relationship between y t andỹ y t=tÀ1 becomes: The expectation is then: 8 Therefore, a corrective term should be added to the estimations exponential. From a theoretical point of view, this is quite difficult, because the innovations variance is modified as soon as the parameters change. Empirical tests are nevertheless performed, in order to measure this bias.
Measuring the performances. Another important choice must be made before initiating the iteration process, concerning the error covariance matrix H. This matrix is important because it is added to the innovations covariance matrix during the innovation phase. In the simple Kalman filter, the relationship between the innovations matrix F t and the system matrix H is: is the covariance matrix of the non-observable variables and Z is a system matrix included in the measurement equation.
During the next iteration phase, the inverse of the innovations matrix is used to update the non-observable variables and their covariance matrix: ( So, the matrix H has an influence on the updated values of the non-observable variables. If its terms are too high, the model performances will be poor. Most of the time, this matrix is estimated relying on the variances and the covariance of the estimations database. This method is used in this article and it is shown how strongly this choice affects the empirical results.

IV. COMPARISON BETWEEN THE TWO FILTERS
Comparing the performances of Schwartz's model measured with the two filters makes it possible to assess the influence of the linearization on the results. In this section, the empirical data are first presented. Then the performance criteria are presented. Finally, the results are delivered and commented.

Data
The data used for the empirical study are daily crude oil settlement prices for the West Texas Intermediate (WTI) futures contracts negotiated on the New York Mercantile Exchange (Nymex) from 25 September 1995 to 14 January 2002. They have been arranged so that the first futures price maturity 1 is the one month maturity, and that the second futures price corresponds to the two months maturity 2 . Keeping the first observation of each group of five, this daily data were transformed into weekly data. Four series of futures prices 9 corresponding to maturities of one, three, six and nine months were used to estimate the parameters, and to measure the model's performance. The interest rates are T-bill rates for a three months maturity. As they are supposed to be constant in the model, the mean of all the observations between 1995 and 2002 was used.

Performances criteria
Two criteria were used to measure the model performances: the mean pricing error (MPE) and the root mean squared errors (RMSE).
The MPE is defined as follows: where N is the number of observations,F Fðn, Þ is the estimated futures price for maturity at the date n, and F(n, ) is the observed futures price. The MPE is expressed in US dollars. It measures the estimation bias for one given maturity. If the estimation is good, the MPE should be very close to zero. Using the same notation, the RMSE, expressed in US dollars, is, for a given maturity : When there is no bias, the RMSE can be considered as an empirical variance. It measures the estimation stability. This second criterion is considered as more representative because price errors can offset themselves and the MPE can be low even if there are strong deviations.

Empirical results
The estimation periods used to obtain the parameters are for the following periods: 25 September 1995 to 11 May 1998 and 18 May 1998 to 15 October 2001. After comparing the optimal parameters obtained with the two filters, the model's ability to represent the prices curves is measured on the learning database and on an expanded one. Finally, the sensitivity of the results to the error covariance matrix are examined.
Optimal parameters. The optimal parameters were estimated with the simple and the extended filters. 10 The results obtained for the two periods are represented in Tables 1 and 2. They lead to two remarks. First, the parameters values change with the estimation period. This was observed in several earlier studies. Considering that the parameters are constant is rather restrictive but it significantly reduces the complexity of the analysis. Second, the optimal parameters obtained with the two filters are different. During the first period, the optimal parameters obtained with the extended filter are usually higher than those associated with the simple filter. The principal differences concern the risk premium l and the long run mean . For the second period, the differences are lower, and the most important ones concern the volatilities of the state variables.
These differences show that the linearization has had a significant influence on the parameters. Nevertheless, the latter have always the same order size as those obtained by Schwartz in 1997 on the crude oil market and on different periods.
The model performances. A simple graphical analysis is first used to comment the model performances obtained with the two filters. Then the MPE and RMSE criteria are used to compare them. The results associated with the simple filter are also corrected for the logarithm. Lastly the innovations obtained with the two filters are compared. Figure 1 represents the one-month futures prices observed during 1998-2001 and compares them with the futures prices estimated with the two filters. This graphic shows that first, the two filters, especially the simple one, attenuate the range of price fluctuations. This phenomenon is observed for the two study periods and for every maturity. Second, the Kalman filters can be used with extremely volatile data. During 1998-2001, the crude oil prices ranged from USD 11 per barrel to USD 37! Tables 3 and 4 give the performances of Schwartz's model, measured by the MPE and the RMSE criteria. Three conclusions can be drawn from these results. First, the model is able to reproduce the prices curve quite precisely. The average MPE is always less than 18 cents per barrel and the RMSE is quite low, especially for the shorter period (1995)(1996)(1997)(1998). Second, if the RMSE is the relevant criterion, then the simple filter is always more precise than the extended one. Third, these measures always decrease with maturity, which is consistent with Schwartz's results on other periods. Nevertheless, Schwartz worked with longer maturities, and showed that the root mean squared error increases again for deliveries after 15 months.
To be rigorous, the model performances associated with the simple Kalman filter should be corrected when the model is expressed in terms of logarithms. Table 5 compares the performances obtained with and without correction. The results show that the correction slightly improves the performances. Therefore, in the present case, the bias associated with the logarithm as a minor influence 10 15 20 25 30 35 1 8 / 0 5 / 9 8 1 8 / 0 7 / 9 8 1 8 / 0 9 / 9 8 1 8 / 1 1 / 9 8 1 8 / 0 1 / 9 9 1 8 / 0 3 / 9 9 1 8 / 0 5 / 9 9 1 8 / 0 7 / 9 9 1 8 / 0 9 / 9 9 1 8 / 1 1 / 9 9 1 8 / 0    on the results, probably because the variance of the residuals is small for reasonable parameters values. Finally, Fig. 2 represents the behaviour of the innovations for the one-month maturity and for the second study period. It shows that for both filters, the innovations tend to return to zero. The same observation can be made for the other maturities, likewise for both periods. Figure 2 also shows that even if the MPE are low for the two filters, the pricing errors can be rather important at certain dates.
The performances analysis shows that there is clearly an impact of the linearization introduced in the extended filter. However, the most important is that, even if this impact is negative, the model's ability to represent the prices curve is still good with an extended filter.
Expanding the database. The parameter estimates vary with the estimation periods. Hence, one question arises: how often is it necessary to recalculate the parameters?
In order to answer that question, the parameters previously estimated to measure the model performances were used on an expanded database. These tests were carried out on two intervals of three months located in the prolongation of the estimation periods, namely 18 May 1998 to 17 August 1998 and 21 October 2001 to 14 January 2002. Tables 6 and 7 present the results. Two conclusions can be drawn.
First, in 1998, the model is more precise with the extended filter. However, in 2001-2002, the simple filter again gives the best performances. Second, the model performances decrease strongly when the database is expanded. The RMSE and the MPE rise dramatically for the two periods. This phenomenon is particularly pronounced when the futures prices are volatile, during 2001-2002, and it will probably be even more marked as the database is increased. So, there is a strong incentive to recalculate the optimal parameters each time the model is used. This is not a major drawback, at least when there is an analytical solution for the model, because the estimation process is very fast.

V. CONCLUSION
Kalman filters are powerful tools suitable for use in many fields in finance, because they are fast even for large data sets and they can handle unobservable variables. Moreover, they can be used for linear as well as non-linear models, even if the models have no analytical solutions.
The main conclusions of this article are the following. First, the approximation introduced in the extended Kalman filter due to linearizing the model, clearly influences the model performances: the extended filter generally leads to less precise estimates than the simple one. Nevertheless, as the difference between the two filters is quite small, the extended filter is still acceptable in the present case. So, the approximation is not a real problem until the model becomes highly nonlinear. Second, the system matrix containing the errors of the measurement equation affects the model performances and can be used to obtain more precise results. Third, as far as the term structure models of commodity prices are concerned, the parameters are not constant in time and should be recomputed regularly. This can become a problem if the model has no analytical solution, because of the computing time.
In order to improve the use of the Kalman filters, some further studies could be considered. For example, in the matrix representing the errors in the measurement equation (which is most of the time estimated with variances and covariance), one could also try to use variograms. This tool borrowed from geostatistics used to describe spatial   or temporal correlation. 11 More precisely, a variogram models the variation of the correlations between a pair of points of the same variable as a function of the spatial or temporal distance. Another improvement could be done concerning the analysis of the bias associated with the logarithms in the simple Kalman filter. To reduce this bias, variance minimization could be included in the iterative process used to estimate the optimal parameters. Lastly, to face the problem of time varying parameters in term structure models of commodity prices, one could study the sensitivity of the estimated futures prices to the parameters.

APPENDIX: THE SIMPLE AND THE EXTENDED KALMAN FILTERS
This appendix presents the simple and the extended Kalman filters, and explains how to estimate the model parameters.
The simple Kalman filter 12 The state-space form model, in the simple filter, is characterized by the following equations: where t is the m-dimensional vector of non-observable variables at t, also called state vector, T is a matrix (m Â m), c is an m-dimensional vector, and R is (m Â m) . Measurement equation: where y t/t À 1 is an N-dimensional temporal series, Z is a (N Â m) matrix, and d is an m-dimensional vector.
t and " t are white noises whose dimensions are respectively m and N. They are supposed to be normally distributed, with zero mean and with Q and H as covariance matrices: The initial value of the system is supposed to be normal, with mean and variance: If t is a non-biased estimator of t , conditionally on the information available at t, then: As a consequence, the following expression 13 defines the covariance matrix P t : During one iteration, three steps are successively tackled: prediction, innovation and updating.
where t=tÀ1 and P t/t À 1 are the best estimators of t/t À 1 and P t/t À 1 , conditionally on the information available at (t À 1).
whereỹ y t=tÀ1 is the estimator of the observation y t conditionally on the information available at (t À 1), and v t is the innovation process, with F t as a covariance matrix.
Updating : The matrices T, c, R, Z, d, Q, and H are not time dependent in the simplest case that is considered in this article. They are the system matrices associated with the statespace model.

The extended Kalman filter 14
When the model is non-linear, it is generally impossible to obtain an optimal estimator for the non-observable variables. The simplest way to handle non-linearity is to linearize the equations. This is the idea behind the extended Kalman filter. However, because of this linearization at each step, it may happen that the approximate solution diverges on the long run.
In the non-linear case, the measurement and transition equations of the state-space form model are the following: . Transition equation: t=tÀ1 ¼ Tð tÀ1 Þ þ Rð tÀ1 Þ t where t/t À 1 is the m-dimensional state vector at t, T( t À 1 ) and R t ( t À 1 ) are non-linear functions, from R m to R m , depending on the values of the state variables at (t À 1).
. Measurement equation: y t=tÀ1 ¼ Zð t=tÀ1 Þ þ " t where y t/t À 1 represents an N dimensional temporal series, and Z( t/t À 1 ) is a non-linear function, from R N to R N , of the non-observable variables.
As was the case in the simple filter, the two processes " t and t are supposed to be normally distributed, with zero mean, with H and Q as covariance matrices, and P t is the covariance matrix associated with t .
. Linearization: If the functions Z( t/t À 1 ) and T( t À 1 ) are smooth enough, it is possible to compute their first order development around respectively t=tÀ1 and tÀ1 , where t=tÀ1 is the expectation of t , conditionally on the information available at (t À 1), and tÀ1 is the value obtained for the state variable in (t À 1), at the end of the updating phase. The state-space linearized model is then: In the extended version, the three iteration steps are the following: Prediction : t=tÀ1 ¼ Tð tÀ1 Þ P t=tÀ1 ¼T TP tÀ1T T 0 þR RQR R 0 & where t=tÀ1 and P t/t À 1 are the estimators for t/t À 1 and P t/t À 1 , conditionally on the information available at (t À 1).
Innovation :ỹ whereỹ y t=tÀ1 is the estimation of the observation y t , conditionally on the information available at (t À 1), and v t is the innovation process with F t as a covariance matrix.
Updating : In the most simple case, the functions Z( t/t À 1 ), T( t À 1 ), and R( t À 1 ), just as the covariance matrices H and Q, are not time dependent. Z( t/t À 1 ), T( t À 1 ), and R( t À 1 ) are the system functions. H and Q are the system matrices.

The parameters estimation
Suppose that the non-observable variables and the errors are normally distributed. Then one can use the maximum likelihood to estimate the model parameters, which are supposed to be constant. One has therefore to maximize the likelihood, or equivalently to minimize its logarithm. This implies that the likelihood for many parameter values must be computed. For that purpose, the Kalman filter is used each time with the current value of the parameters, and at each iteration, the logarithm of the likelihood function was computed for the innovation v t : where F t is the covariance matrix associated with the innovation v t , and dF t its determinant. 15 In the present case, the measurement equation admits continuous partial derivatives of first and second order on the parameters. Therefore, one can use a more powerful minimization method. Once the optimal parameters have been obtained, the Kalman filter is used, for the last time, to reconstitute the non-observable variables and the measureỹ y.