Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Modelling of the financial variable evolution represents an important issue in financial econometrics. Stochastic dynamic models allow to describe more accurately many features of the financial variables, but often there exists a trade-off between the modelling accuracy and the complexity. Moreover the degree of complexity is increased by the use of latent factors, which are usually introduced in time series analysis, in order to capture the heterogeneous time evolution of the observed process. The presence of unobserved components makes the maximum likelihood inference more difficult to apply. Thus the Bayesian approach is preferable since it allows to treat general state space models and makes easier the simulation based approach to parameters estimation and latent factors filtering. The main aim of this work is to produce an updated review of Bayesian inference approaches for latent factor models. Moreover, we provide a review of simulation based filtering methods in a Bayesian perspective focusing, through some examples, on stochastic volatility models.


Introduction
The analysis of dynamic phenomena, which evolve over time is a common problem to many fields like engineering, physics, biology, statistics, economics and finance. A time varying system can be represented through a dynamic model, which is constituted by an observable component and an unobservable internal state. The hidden state vector represents the desired information that we want to extrapolate from the observations. Several kinds of dynamic models have been proposed in the literature for time series analysis and many approaches have been used for the estimation of these models. The seminal work of Kalman [22] and Kalman and Bucy [23] introduces filtering techniques (Kalman-Bucy filter) for continuous valued, linear and Gaussian dynamic systems. Another relevant work on dynamic model analysis is due to Maybeck [30], [31], [32]. He motivates the use of stochastic dynamic systems in engineering and examines filtering, smoothing and estimation problems for continuous state space models, in both a continuous and a discrete time framework. Moreover Harvey [20] extensively studies state space representation of dynamic models for time series analysis and treats the use of Kalman filter for states and parameters estimation, in continuous state space setting. Hamilton [19] analyzes several kinds of time series models and in particular introduces a filter (Hamilton-Kitagawa filter) for discrete time and discrete valued dynamic system. This filter can be used for dynamic models with a finite number of state values.
Bauwens, Lubrano and Richard [2] compare maximum likelihood inference with Bayesian inference on static and dynamic econometric models. Harrison and West [21] treat the problem of the dynamic model estimation in a Bayesian perspective. They give standard filtering and smoothing equations for Gaussian linear models and investigate the estimation problem for conditionally Gaussian linear models and for general nonlinear and non-Gaussian models. They review some Markov Chain Monte Carlo (MCMC) simulation techniques for filtering and smoothing the hidden states and for estimating parameters. Moreover, also the problem of processing data sequentially has been examined through the use of the adaptive importance sampling algorithm. Kim and Nelson [24] analyze Monte Carlo simulation methods for nonlinear discrete valued model (switching regimes models). Recently, Durbin and Koopman [15] propose an updated review on Markov Chain Monte Carlo methods for estimation of general dynamic models, with both a Bayesian and a maximum likelihood approach.
Sequential simulation methods for filtering and smoothing in general dynamic models have been recently developed to overcome some problems of the traditional MCMC methods. As pointed out by Liu and Chen [26], Gibbs sampler is less attractive when we consider on-line data processing. Furthermore Gibbs sampler may be inefficient when simulated states are very sticky and the sampler has difficulties to move in the state space. In these situations, the use of sequential Monte Carlo techniques and in particular of particle filter algorithms may result more efficient. Doucet, Freitas and Gordon [13] provide the state of the art on sequential Monte Carlo methods. They discuss both applications and theoretical convergence results for these algorithms, with special attention to particle filters.
The main aim of this work is to introduce to Bayesian dynamic modelling and to the related inference problems of parameters estimation and hidden states filtering, with application to some basic discrete time stochastic volatility models. The second aim of the work is to review and discuss drawbacks of most important simulation based inference approaches and to show how traditional simulation methods, like the single-move Gibbs sampler and the multi-move Gibbs sampler apply to a dynamic model. Finally we introduces recently developed sequential Monte Carlo simulation methods, focusing on Particle Filters. Throughout the work, many examples illustrate how Bayesian simulation based inference apply to basic Stochastic Volatility models.
The work is structured as follow. Section 2 introduces the general representation of a dynamic model in a Bayesian framework and deals with continuous stochastic volatility model (SV) and with Markov switching SV model which do not admit analytical filtering and smoothing densities. Section 3 reviews simulation based methods for inference. In particular the section reviews MCMC methods, presents an adaptive importance sampling algorithm and discusses particle filter algorithms. Section 4 concludes.

Bayesian Dynamic Models
In the following we give a quite general formulation of a probabilistic dynamic model and we obtain some fundamental relations for Bayesian inference on it. This definition of dynamic model would be general enough to include time series models analyzed in Kalman [22], Hamilton [19], Harrison and West [21] and in Doucet, Freitas and Gordon [13]. Throughout this work, we use a notation similar to that one commonly used in particle filter literature (see Doucet, Freitas and Gordon [13]).
We denote by {x t ; t ∈ N}, x t ∈ X , the hidden state vectors of the system, by {y t ; t ∈ N 0 }, y t ∈ Y, the observable variables and by θ ∈ Θ the parameter vector of the model. We assume that state space, observation space and parameter space respectively are X ⊂ R nx , Y ⊂ R ny and Θ ⊂ R n θ . n x , n y and n θ represent the dimensions of the state vector, of the observable variable and of the parameter vector respectively. The main advantage in using the general Bayesian state space representation of a dynamic model, is that it accounts also for nonlinear and non-Gaussian models. The Bayesian state space representation is given by an initial distribution p(x 0 |θ), a measurement density p(y t |x t , y 1:t−1 , θ) and a transition density p(x t |x 0:t−1 , y 1:t−1 , θ). The dynamic model is where p(x 0 |θ) can be interpreted as the prior distribution on the initial state of the system. By x 0:t ∆ = (x 0 , . . . , x t ) and by y 1:t ∆ = (y 1 , . . . , y t ) we denote respectively the collection of state vectors and of observable vectors, up to time t. We denote by the collection of all the state vectors without the t-th element. The same notation is used also for the observable variable and for the parameter vectors.
If the transition density depends on the past, only through the last value of the hidden state vector, the dynamic model is defined first-order Markovian. In this case the system becomes Assuming that the first-order Markov property holds is not restrictive because a Markov model of order p can always be rewritten as a first-order Markovian model. The stochastic volatility model given in Example 1, is a continuous states and discrete time dynamic model, which are now widely used in finance.

Example 1 -(Stochastic Volatility Models)
Two of the main features of the financial time series are time varying volatility and clustering phenomena in volatility. Thus stochastic volatility models have been introduced, in order to account for these features. An example of stochastic log-volatility model is where y t is the observable process, with time varying volatility and h t represents the stochastic log-volatility process. In Fig. 1 we exhibit a simulated path of the observable process y t and of the stochastic volatility process h t .
In the following we discuss the three main issues which arise when making inference on a dynamic model, i.e.: filtering, predicting and smoothing. We present general solutions to these problems, but note that, without further assumptions on the densities, which characterize the dynamic model, these solutions are not yet analytical.  Figure 1: Simulated paths of 1,000 observations from the observable process y t , with time varying volatility and of the stochastic volatility process h t . We simulate the model given in Example 1 by setting φ = 0.9, ν = 0.1 and σ 2 = 1.

State Estimation
We treat here the problem of estimation of the hidden state vector when parameters are known. We are interested in estimating the density p(x t |y 1:s , θ). If t = s the density of interest is called filtering density, if t < s it is called smoothing density, finally if t > s it is called prediction density. For the dynamic model given in equations (4), (5) and (6) we assume that at time t the density p(x t−1 |y 1:t−1 , θ) is known. Observe that if t = 1 the density p(x 0 |y 0 , θ) = p(x 0 |θ) is the initial distribution of the dynamic model.
Applying the Chapman-Kolmogorov transition density, we obtain the one step ahead prediction density As the new observation y t becomes available, it is possible, using the Bayes theorem, to update the prediction density and to filter the current state of the system. The filtering density is p(x t |y 1:t , θ) = p(y t , x t |y 1:t−1 , θ) p(y t |y 1:t−1 , θ) = p(y t |x t , y 1:t−1 , θ)p(x t |y 1:t−1 , θ) p(y t |y 1:t−1 , θ) where p(x t |y 1:t−1 , θ) is the prediction density determined at the previous step and the density at the denominator is the marginal of the current state and observable variable joint density At each date t, it is possible to determine the K-steps-ahead prediction density, conditional on the available information y 1:t . Given the dynamic model described by equations (4), (5) and (6), the K-steps-ahead prediction density of the state vector x t can be evaluated iteratively. The first step is p(x t+1 |y 1:t , θ) = X p(x t+1 |x t , y 1:t , θ)p(x t |y 1:t , θ)dx t (13) and the k-th step, with k = 1, . . . , K, is where where Y k = k i=1 Y i is the k-times product of the state space. Similarly, the K-steps-ahead prediction density of the observable variable y t+K conditional on the information available at time t is determined as follow p(y t+K |y 1:t , θ) = Y p(y t+K |x t+K , y 1:t+K−1 , θ)p(dy t+1:t+K−1 |y 1:t , θ)p(dx t+K |y 1:t , θ) Due to the high number of integrals that must be solved, previous densities may be difficult to evaluate with general dynamics. From a numerical point of view simulation methods, like MCMC algorithms or Particle Filters allow us to overcome these difficulties; while from an analytical point of view to obtain simpler relations we need to introduce some simplifying hypothesis on the dynamics of the model. For example if we assume that the evolution of the dynamic model does not depend on the past values of the observable variable y 1:t , then equations (4), (5) and (6) become x 0 ∼ p(x 0 |θ) , with t = 1, . . . , T . Figure 2: The causality structure of a Markov dynamic model with hidden states is represented by means of a Directed Acyclic Diagram (DAG). A box around the variable indicates the variable is known, while a circle indicates a hidden variable.
We conclude this section with two important recursive relations. Both these relations can be proved starting from the definition of joint smoothing density and assuming that the Markov property holds.
The way this relation is obtained differs from that one given in Carter and Köhn [7] because we consider the general model in equations (4)-(6), with transition and measurement densities depending on past values of y t . Note that the density p(x t |x t+1 , y 1:t , θ), which appears in the joint smoothing density, can be represented through the filtering and the prediction densities This factorization of the smoothing density is also relevant when inference is carried out through simulation methods. See for example the multi-move Gibbs sampler of Carter and Köhn [7] and the particle filter algorithms.
Only in some well known cases, these filtering densities admit an analytical form. For the normal linear dynamic model, filtering and smoothing density are given by the Kalman filter. See Harrison and West [21] for a Bayesian analysis of the Kalman filter. See Harvey [20] for a frequentist approach to the Kalman filter.
A class of models which do not admit a tractable analytical representation of the filtering, prediction and smoothing densities is the class of conditional normal linear models (also called multi-process models, in Harrison and West [21]), which are widely used in economic analysis (see Kim and Nelson [24]). In the next section, we introduce another example of nonlinear dynamic models with jumps, which are used in financial applications.

Stochastic Volatility Models with Jumps
Using a Bayesian representation, stochastic volatility model with jumps can be defined as follow for t = 1, . . . , T , where s t is the jump process. If s t is a discrete time and finite state Markov chain with transition probabilities The left upper graph exhibits the evolution of the hidden jump process, the right upper graph shows the log-volatility of the observable process, which is represented in the third graph.
i, j = 1, . . . , L, L denoting the number of unobservable states, then the model is called Markov switching. Fig. 3 exhibits simulation paths of 1, 000 observations of the Markov switching process, the latent factor and the observable variable, respectively. In order to simulate the MS stochastic volatility model we use parameters estimated in So, Lam and Li [40].

Simulation Based Filtering
The main aim of this section is to review both some traditional and recently developed inference methods for nonlinear and non-Gaussian models. We focus on the Bayesian approach and on simulation based methods. First Markov Chain Monte Carlo methods are reviewed with particular attention to the single-move Gibbs sampler due to Carlin, Polson and Stoffer [6] and to the multi-move Gibbs sampler due to Carter and Köhn [7] and Frühwirth-Schnatter [16]. Moreover some basic sequential Monte Carlo simulation methods are introduced. We examine the adaptive importance sampling algorithm due to West [44], [45], the sequential importance sampling algorithm and more advanced sequential Monte Carlo algorithms called Particle Filters (see Doucet, Freitas and Gordon [13]).
Finally, note that another important issue in making inference on dynamic models is the estimation of the parameter vector. In a Bayesian MCMC based approach, parameters and hidden states are jointly estimated by simulating from the posterior distribution of the model. Also in a sequential importance sampling approach and following the engineering literature, a common way to solve the parameter estimation problem is to treat parameters θ as hidden state of the system (see Berzuini et al. [3]). The model is restated assuming time dependent parameter vectors θ t , and imposing a constraint on the evolution: θ t = θ t−1 . The constraint can be expressed also in terms of transition probability as follows where δ x * (x) denotes the Dirac delta function. The Bayesian model is then completed by assuming a prior distribution p(θ 0 ) on the parameter vector.

The Gibbs Sampler
In previous sections we examine some estimation algorithms for filtering, predicting and smoothing the state vector of a quite general probabilistic dynamic model. In order to examine Gibbs sampling methods, we consider the following dynamic model The estimation problem is solved in a Bayesian perspective by evaluating the mean of the joint posterior density of the state and parameter vectors p(x 0:T , θ|y 1:T ). Tanner and Wong [42] motivates this solution by the data augmentation principle, which consists in considering the hidden state vectors as nuisance parameters.
If an analytical evaluation of the posterior mean is not possible, then simulation methods and in particular Monte Carlo Markov Chain apply. The most simple solution is to implement a singlemove Gibbs sampler (see Carlin, Polson and Stoffer [6] and Harrison and West [21]). This method generates the states one at a time using the Markov property of the dynamic model to condition on the neighboring states. However the first order Markov dependence between adjacent states induces a high correlation between outputs of the Gibbs sampler and causes a slow convergence of the algorithm. To solve this problem Carter and Köhn [7] and Frühwirth-Schnatter [16] simultaneously proposes multi-move Gibbs sampler. The main idea of this method is to generate simultaneously all the state vectors using analytical filtering and smoothing relations.
Their approach is less general than that of Carlin, Polson and Stoffer [6], but for linear dynamic models with Gaussian mixture innovations in the observation equation, their approach is more efficient. In particular the multi-move Gibbs sampler has a faster convergence to the posterior distribution and the posterior moment estimates have smaller variance. These results are supported theoretically by Liu, Wong and Kong [28], [29] and Müller [33], who shows that generating variables simultaneously produces faster convergence than generating them one at a time.
The idea of grouping parameters (or hidden states) when simulating is now commonly in Bayesian inference on stochastic models, with latent factors. For example, multi-move MCMC algorithms have been used for stochastic volatility models by Kim, Shephard and Chib [25] and extended to generalized stochastic volatility models by Chib, Nardari and Shephard [10]. Shephard [38] and Shephard and Pitt [39] discussed multi-move MCMC algorithms to non-Gaussian time series models . Finally, an alternative way of simulating from the smoothing density of the state vectors is discussed in De Jong and Shephard [12].
In the following we briefly present how to implement the single-move Gibbs sampler for parameters and states estimation. On the time interval {1, . . . , T }, the conditional posterior distributions of the parameter vector and of the state vectors are p(x 0:T |y 1: A basic Gibbs sampler is obtained by simulating sequentially from the parameter vector posterior (parameter simulation step) in equation (37) and from the state vectors posterior (data augmentation step) in equation (38) conditionally on the parameter vector simulated at the previous step. When conditional distributions cannot be directly simulated, the corresponding steps in the Gibbs algorithm can be replaced by Metropolis-Hastings steps. The resulting algorithms are called hybrid sampling algorithms and they are validated in Tierney [43]. A generic Gibbs sampler can be used for simulating the posterior of the parameter vector conditional on the simulated state vectors. The single-move Gibbs sampler for the state vectors, conditional on the simulated parameter vector, is then obtained by drawing each state vector conditionally on the other simulated state vectors. Single-move algorithm can be implemented for general dynamic models. Moreover, note that the dynamic model given in equations (33) Consider the full posterior density of the t-th state vector and apply the independence assumption between y t+1:T and x t then the last density simplifies as follow p(x t+1:T |x 0:t , y 1:t , θ)p(y t |x t , y 1:t−1 , θ)p(x t |x t−1 , y 1:t−1 , θ) p(x t+1:T , y t |x 0:t−1 , y 1:t−1 , θ) .
and due to this property of the posterior density, the implementation of the single-move Gibbs sampling algorithm becomes easier.

Example 2 -(Stochastic Volatility Model, Example 1 continued)
We illustrate, through this example, how simulation based Bayesian inference applies to dynamic models. In particular, we develop a single-move Gibbs sampler for the stochastic volatility model given in Example 1. Note that the stochastic volatility model can be rewritten as follows with t = 1, . . . , T.
The completed likelihood function of this model is In order to conclude the description of the Bayesian dynamic model we take the following prior distributions on the parameters φ, ν, σ 2 and h 0 where IG(·, ·) indicates the Inverse Gamma distribution. Note that for the parameter φ we use a truncated Normal distribution, but a Uniform distribution U [−1,1] could alternatively be used. In the first part of the single-move Gibbs sampler, the parameters are simulated from the posterior distributions, through the following three steps (i) Simulate φ ∼ π(φ|σ 2 , ν, y 1:T , h 0:T ), where π(φ|σ 2 , ν, y 1: (iii) Simulate σ 2 ∼ π(σ 2 |φ, ν, y 1:T , h 0:T ), where π(σ 2 |φ, ν, y 1:T , h 0:T ) ∝ (49) In the second part of the single-move Gibbs sampler, the hidden variables, {h t } T t=1 , are sampled once at time, from the following posterior distribution for t = 1, . . . , T . Due to the presence of the quantity e −ht in the exponential any standard simulation method easily applies here. Moreover the posterior distribution is known up to a normalising constant, thus the M.-H. algorithm can be used. The proposal distribution can be obtained by replacing the function y 2 t e −ht + h t in equation (50), with (log y 2 t + 1) + (h t − log y 2 t ) 2 /2, which is its the second order Taylor expansion, around log y 2 t . We simulate h t from the following proposal density We apply the single-move Gibbs sampler to the synthetic dataset exhibited in Fig. 1. We use a burn-in sample of 10,000 observations to reduce the dependence on the initial conditions of the sampler and take the remaining 40,000 values to calculate the parameter estimates, which are represented in Table 1.  Figure 4: Graph of the true (solid line) and filtered (dotted line) densities. The filtered density has been obtained by averaging 50,000 values, simulated through the single-move Gibbs sampler, given in example 2. Figure 4 shows the true stochastic log-volatility versus the filtered volatility. Figures 5 and 6 show the raw output of the single-move Gibbs sampler and the ergodic means respectively. Figures  7 and 8 show the autocorrelation of the simulated values and the histograms of the posterior distributions, respectively.
Simulations have been carried out on a PC with Intel 2400 MHz processor, using routines implemented in Gauss 3.2.8.
Although the simplification due to the first order Markov property of the dynamic model makes the single-move Gibbs sampler easier to implement, some problems arise. In particular, the Markovian dependence between neighboring states generates correlation between outputs of the Gibbs sampler and origins slower convergence to the posterior distribution (see Carter and Köhn [7]). A consequence is that if an adaptive importance sampling is carried out by running parallel single-move Gibbs samplers, the number of replications before convergence of the parameter estimates is high.
A general method to solve this autocorrelation problem in the output of the Gibbs sampler is to group parameters (or states) and to simulate them simultaneously. This idea has been independently applied by Carter and Köhn [7] and by Frühwirth-Schnatter [16] to dynamic models and the resulting algorithm is the multi-move Gibbs sampler. Furthermore Frühwirth-Schnatter   [ 16] shows how the use of the multi-move Gibbs sampler improves the convergence rate of an adaptive importance sampling algorithm and makes a comparison with a set of parallel singlemove Gibbs samplers. The implementation of the multi-move Gibbs sampler depends on the availability of the analytical form of filtering and smoothing densities.
We give here a general representation of the algorithm, but its implementation is strictly related to the specific analytical representation of the dynamic model. Given the simulated parameter vector obtained through the Gibbs sampler in the Algorithm 1, the multi-move Gibbs sampler is in Algorithm 3.
The algorithm has been derived trough the recursive smoothing relation given in equation (26). Moreover, at each simulation step the posterior density is obtained by means of estimated prediction and filtering densities. By applying the fundamental relation given in equation (28) we obtain We stress once more that the multi-move Gibbs sampler does not easily apply to nonlinear and non-Gaussian models. Thus in a Bayesian MCMC approach, the single-move Gibbs sampler remains the only numerical solution to the estimation problem. Moreover estimation performances of both the single-move and multi-move Gibbs samplers depends on the model complexity, the parameter setting and on the specific time series. To overcome these difficulties in the literature adaptive sampling schemes have been proposed. Sequential sampling algorithm represents a first alternative to MCMC methods. Sequential Monte Carlo algorithms allow us to make inference on general dynamic models. One of the early sequential methods proposed in the literature is Adaptive Importance Sampling, which will be discussed in the next section. As alternative to MCMC methods, recently Cappé et al. [5] propose an adaptive importance sampling algorithm, called Population Monte Carlo (PMC). Moreover Guillin et al. [18] and Celeux, Marin, Robert [9] also apply both Gibbs sampler and PMC algorithm to latent variables models and in particular also to the basic continuous stochastic volatility model, evidencing how filtering through PMC produces better results.

Adaptive Importance Sampling
The adaptive sequential importance sampling scheme is a sequential stochastic simulation method which adapts progressively to the posterior distribution also by means of the information contained in the samples, which are simulated at the previous steps. The adaptation mechanism is based on the discrete posterior approximation and on the kernel density reconstruction of the prior and posterior densities. West [44] proposed this technique in order to estimate parameters of static models. West [45] and West and Harrison [21] successively extended the method in order to estimate parameters and states of dynamic models. The first key idea is to use importance sampling (see Robert and Casella [37]) in order to obtain a weighted random grid of evaluation points in the state space. Let {x i t , w i t } nt t=1 be a sample drawn from the posterior p(x t |y 1:t , θ) through an importance density g t . The prediction density, given in equation (20), can be approximated as follow The second key idea, implemented in the adaptive importance sampling algorithm of West [45], is to propagate points of the stochastic grid by means of the transition density and to build a smoothed approximation of the prior density. This approximation is obtained through a kernel density estimation. West [45] suggested to use Gaussian or Student-t kernels due to their flexibility in approximating other densities. For example, the Gaussian kernel reconstruction is The final step of the algorithm consists in updating the prior density and in producing a random sample, , from the resulting posterior density. The sample is obtained by using the kernel density estimate as importance density. Adaptive importance sampling is represented through Algorithm 4. The main advantage of this algorithms relies in the smoothed reconstruction of the prior density. This kernel density estimate of the prior allows to obtain adapting importance densities and to avoid the information loss, which comes from cumulating numerical approximation over time. Furthermore this technique can be easily extended to account for a sequential estimation of the parameter (see the recent work due to Liu and West [27]).

Algorithm 4 -Adaptive Sequential Importance Sampling -
3. Generate from the Gaussian kernel However adaptive importance sampling requires the calibration of parameters a and h, which determines the behavior of the kernel density estimate. The choice of that shrinkage parameters influences the convergence of the algorithm and heavily depends on the complexity of the model studied.
Finally, adaptive importance sampling belongs to a more general class of sequential simulation algorithms, which are particulary efficient for on-line data processing and which have some common problems like sensitivity to outliers and degeneracy. In next paragraph we review some general particle filters.

Particle Filters
In the following we focus on Particle filters, also referred in the literature as Bootstrap filters, Interacting particle filters, Condensation algorithms, Monte Carlo filters and on the estimation of the states of the dynamic model. See also Doucet, Freitas and Gordon [13] for an updated review on the particle filter techniques, on their applications and on the main convergence results for this kind of algorithms.
The main advantages of particle filters are that they can deal with nonlinear and non-Gaussian noise and can be easily implemented, also in a parallel mode. Moreover in contrast to Hidden Markov Model filters, which work on a state space discretised to a fixed grid, particle filters focuse sequentially on the higher probable regions of the state space. This is feature is common to Adaptive Importance Sampling algorithm exhibited in the previous section.
Assume that the parameter vector θ of the dynamic model given in equations (17), (18) and (19) is known. Different versions of the particle filter exist in the literature and different simulation approaches like rejection sampling, MCMC and importance sampling, can be used for the construction of a particle filter. We introduce particle filters applying the importance sampling reasoning.
At each time step t + 1, as a new observation y t+1 arrives, we are interested in predicting and filtering the hidden variables and the parameters of a general dynamic model. In particular we search how to approximate prediction an filtering densities given in Equations (20) and (21) by means of sequential Monte Carlo methods.
Assume that the weighted sample {x i t , w i t } N i=1 has been drawn from the filtering density at time tp Each simulated value x i t is called particle and the particles set, {x i t , w i t } N i=1 , can be viewed as a random discretization of the state space X , with associated probabilities weights w i t . It is possible to approximate, by means of this particle set, the prediction density given in Eq. (21) as follows which is called empirical prediction density and is denoted byp(x t+1 |y 1:t , θ). By applying the Chapman-Kolmogorov equation it is also possible to obtain an approximation of the filtering density given in Eq. (21) Filtering Initial Particle Set p(xt|y 1:t ) e e e e e e Figure 9: Particles evolution in the SIS particle filter.
p(x t+1 |y 1:t+1 , θ) ∝ p(y t+1 |x t+1 , θ)p(x t+1 |y 1:t , θ) which is called empirical filtering density and is denoted byp(x t+1 |y 1:t+1 , θ). Assume now that the quantity E(f (x t+1 )|y 1:t+1 ) is of interest. It can be evaluated numerically by a Monte Carlo sample {x i t+1 , w i t+1 } N i=1 , simulated from the filtering distribution A simple way to obtain a weighted sample from the filtering density at time t + 1 is to apply importance sampling to the empirical filtering density given in equation (57). This step corresponds to propagate the initial particle set (see Fig. 9) through the importance density q(x t+1 |x i t , y t+1 , θ). Moreover if we propagate each particle of the set through the transition density p(x t |x i t−1 , θ) of the dynamic model, then particle weights update as follows This is the natural choice for the importance density, because the transition density represents a sort of prior at time t for the state x t+1 . However, as underlined in Pitt and Shephard [35] this strategy is sensitive to outliers. See also Crisan and Doucet [11], for a discussion on the choice of the importance densities. They focused on the properties of the importance density, which are necessary for the a.s. convergence of the sequential Monte Carlo algorithm.
The basic particle filter developed through previous equations is called Sequential Importance Sampling (SIS). In Algorithm 5, we give a pseudo-code representation of this method.
Sequential importance sampling permits to obtain recursive updating of the particles weights and is based on the sequential decomposition of the joint filtering density and on a particular choice of the importance density. To evidence these aspects we consider the following smoothing problem.
We want to approximate the smoothing density p(x 0:t+1 |y 1:t+1 , θ), of the state vectors as follows p(x 0:t+1 |y 1:t+1 , θ) by simulating {x i 0:t+1 } N i=1 from a proposal distribution q(x 0:t |y 1:t , θ) and by correcting the weights of the resulting empirical density. The correction step comes from an importance sampling argument, thus the unnormalized particles weights § are defined as follows § Note that importance sampling requires to know the importance and the target distributions up to a proportionality constant, thus the unnormalized weights may not sum to one. However normalized importance sampling weights can be easily obtained as follows p(x i 0:t+1 |y 1:t+1 , θ) q(x i 0:t+1 |y 1:t+1 , θ) .
The key idea used in SIS consists in obtaining a recursive relation for the weights updating. This property makes them particulary appealing for on-line applications.
Thus for the i-th particle, the weight updating recursive relation is Moreover, if we assume that the importance density for the state x t+1 is the transition density: q(x t+1 |x t , y t+1 , θ) = p(x t+1 |x t , θ), then equation (63) simplifies to Example 3 -(SV Model and SIS algorithm, Example 1 continued) The normalization procedure causes the loss of the unbiasness property because the quantity at the denominator is a random variable.
The first aim of this example is to show how sequential importance sampling applies to a specific Bayesian dynamic model. We consider the SV model introduced through the example 1. In the following, we assume that the parameters are known, because we want to study how SIS algorithms work just for filtering the log-volatility. The second aim of this example, is to evidence the degeneracy problem, which arises in using SIS algorithms. Given the initial weighted particle set h i t , w i t , the SIS filter performs the following steps (ii) Update the weights as follow By applying the SIS algorithm to the synthetic data, simulated in Example 1, we obtain the filtered log-volatility represented in Fig. 10. Note that after some iterations the filtered logvolatility does not fit well to the true log-volatility. The Root Mean Square Error (RMSE) defined as measures the distance between the true and the filtered series. In the Fig. 11 the RMSE cumulates rapidly over time. Moreover, the same figure exhibits the estimated variance of the particle weights. This indicator shows how the SIS algorithm degenerates after 430 iterations. The discrete probability mass concentrates on one particle, the others particle having null probability.
As evidenced in Example 3 and as it is well known in the literature (see for example Arulampalam, Maskell, Gordon and Clapp [1]), basic SIS algorithms have a degeneracy problem. After some iterations the empirical distribution degenerates into a single particle, because the variance of ( )  Figure 11: The Root Means Square Error between true and filtered log-volatility, and the estimated variance of the particle weights, which allows us to detect degeneracy of the particle weights.
particles after the re-sampling step. Thus, particle value changes and the impoverishment is partially avoided. Finally, Pitt and Shephard [35] introduce the Auxiliary Particle Filter (APF) and applied it to a Gaussian ARCH-type stochastic volatility model. They find that the auxiliary particle filter works well and that the sensibility to outliers is lower than in the basic filters. In the following we focus on the APF algorithm. In order to avoid re-sampling, the APF algorithm uses an auxiliary variable to select most representative particles and to mutate them through a simulation step. Then weights of the regenerated particles are updated through an importance sampling argument. In this way particles with low probability do not survive to the selection and the information contained in the particle set is not wasted. In particular the auxiliary variable µ i Note that the empirical filtering density given in Eq. (57) is a mixture of distributions, which can be reparameterised by introducing an auxiliary variable i ∈ {1, . . . , N }, which indicates the component of the mixture. The joint distribution of the hidden state and of the index i is p(x t+1 , i|y 1:t+1 , θ) = p(y t+1 |y 1:t , x t+1 , i) p(y t+1 |y 1:t , θ) p(x t+1 , i|y 1:t , θ) = (66) = p(y t+1 |x t+1 , θ) p(y t+1 |y 1:t , θ) p(x t+1 |i, y 1:t , θ)p(i|y 1:t , θ) = = p(y t+1 |x t+1 , θ) p(y t+1 |y 1:t , θ) The basic idea of the APF is to refresh the particle set while reducing the loss of information due to this operation. Thus, the algorithm generates a new set of particles by jointly simulating the particle index i (selection step) and the selected particle value x t+1 (mutation step) from the reparameterised empirical filtering density, according to the following importance density q(x j t+1 , i j |y 1:t+1 , θ) = q(x j t+1 |y 1:t+1 , θ)q(i j |y 1:t+1 , θ) for j = 1, . . . , N . Note that the index is sampled using weights which are proportional to the observation density conditional on a summary statistics on the initial particle set. In this way, less informative particles are discarded. The information contained in each particle is evaluated with respect to both the observable variable and the initial particle set. By following the usual importance sampling argument, the updating relation for the particle weights is In order to implement SIR algorithm we introduce a resampling step after the propagation of the initial set of particle h i t , . The steps of the SIR are By applying the SIR algorithm to the synthetic data, simulated in Example 1, we obtain the filtered log-volatility represented in Fig. 13. In Fig. 14, the first graph on the left compares SIS and SIR algorithm performances evaluated in terms of RMSE. Note how resampling, (selection step), effectively improves the performance of the basic SIS filter, avoiding the degeneracy of the particle weights and reducing the RMSE.  Figure 14: The first graph on the left compares SIS and SIR algorithms in terms of Root Means Square Error between true and filtered log-volatility. The second graph shows the estimated variance of the particle weights, which allows us to detect degeneracy of the particle weights.  Figure 16: The first graph on the left compares SIS, SIR and APF algorithms in terms of Root Means Square Error between true and filtered log-volatility. The second graph shows the estimated particle weight variance, which allows us to detect degeneracy of the filter. Note that, for simulating auxiliary variables i j , Pitt and Shephard [35] suggest to use another proposal distributions, based on the Taylor expansion of the measurement distribution. By applying the APF algorithm to the synthetic data, simulated in Example 1, we obtain the filtered log-volatility represented in Fig. 15. In Fig. 16, the first graph on the left compares SIS, SIR and APF algorithms in terms of RMSE. Although the variability in the weights of the particle set and the RMSE are greater than in the SIR algorithm, there is not degeneracy and the performance of APF algorithm is superior than that one of the basic SIS algorithm. The poor performance of the APF with respect to the SIR algorithm evidence a well known problem of this kind of algorithm. When the transition density exhibits a high noise variance the use of APF does not improve filtering results.
We conclude this section another example of application of APF to a stochastic volatility model with jumps. As done in previous examples we assume that parameters are known. One of the main issues in particle filtering is the inclusion of the parameter estimation in the sequential learning process. We refer to Berzuini et al. [3] and Storvik [41] for a general discussion of the problem and to Liu and West [27] for an example of joint use of the adaptive importance sampling and auxiliary particle filter.
Example 6 -(APF and MSSV Models, example in Section 2.2 continued) The aim of this example is to show how particle filter algorithms apply to Markov switching stochastic volatility models. In these models latent factor represents the log-volatility of observed variable and the switching states are the volatility phases (high or low) of the financial market. We apply APF algorithm to synthetic data in order to show the efficiency of the algorithm and to detect possible degeneracy of the weights. We use a set of N = 3, 000 particles to obtain empirical filtering and prediction densities. All computation have been carried out on a Pentium IV 2.4 Ghz, and the APF algorithm has been implemented in GAUSS 3.2.8. Figure 17 shows on-line estimation of the latent factor x t and of the switching process s t . In order to detect the absence of degeneracy in the output of the APF algorithm we evaluate at each time step particle weights variance. Graph 18 shows the RMSE and the weights variance at each time step.  Figure 18: Root Means Square Error between true and filtered log-volatility and the estimated variance of the particle weights, which allows us to detect degeneracy of the filter.