Bayesian Inference for Generalised Markov Switching Stochastic Volatility Models

We study a Markov switching stochastic volatility model with heavy tail innovations in the observable process. Due to the economic interpretation of the hidden volatility regimes, these models have many financial applications like asset allocation, option pricing and risk management. The Markov switching process is able to capture clustering effects and jumps in volatility. Heavy tail innovations account for extreme variations in the observed process. Accurate modelling of the tails is important when estimating quantiles is the major interest like in risk management applications. Moreover we follow a Bayesian approach to filtering and estimation, focusing on recently developed simulation based filtering techniques, called Particle Filters. Simulation based filters are recursive techniques, which are useful when assuming non-linear and non-Gaussian latent variable models and when processing data sequentially. They allow to update parameter estimates and state filtering as new observations become available.


Introduction
Stochastic volatility (SV) models find many financial applications, for example option pricing, asset allocation and risk management. The first work on time series with time changing volatility is due to Clark [12]. The most simple continuous SV model has been proposed by Taylor [61], [62], while Hamilton [35] considers a simple discrete SV model. Hull and White [39] introduce maximize the approximated function.
Our work is based on particle filter techniques and belongs to the more general Bayesian framework for time series analysis. Harrison and West [36] provides an introduction to estimation methods for dynamic Bayesian models. Bayesian inference represents an alternative framework to the above cited estimation methods and in the following we discuss the main estimation approaches within this framework.
A first approach is the Monte Carlo Markov Chain-Expectation Maximization method (MCMC-EM). It uses MCMC simulation techniques to evaluate the likelihood function and to calculate the expectation with respect the latent variables. The resulting approximated expectation is then maximized to obtain the ML estimator. Shephard [56], Geyer [30], [31] apply MCMC-EM to stochastic volatility methods. Andrieu and Doucet [3] propose and compare different on-line MCMC-EM algorithms, which allow to process data sequentially. On-line MCMC-EM reveals efficient also for non-linear models if a set of sufficient statistics exists. As example, they evaluate the efficiency of this estimation method also on a basic continuous SV model.
A second approach, in a Bayesian framework, is the Monte Carlo Markov Chain (MCMC) method. It is based on a data completion (or augmentation) principle. It allows to obtain a simulated sample from the posterior distribution of parameters and hidden states, given the available information. Jacquier, Polson and Rossi [40] develop a Bayesian approach to SV model estimation. Their method is based on a hybrid MCMC algorithm and the superiority of the Bayes estimator is exhibited through a comparison with QML and MM estimation methods. De Jong and Shephard [20] apply MCMC approach to SV models and propose a simulation smoother and a multi-move Gibbs sampler to simulate from the disturbances of a time series rather than from the hidden states. The algorithm effectively improves the efficiency of the MCMC method for time series. Shephard and Pitt [57] provide estimation methods for non-Gaussian time series models with application to SV. They analyse MCMC methods for simulation smoothing and parameters estimation and compare them with maximum likelihood estimation. The likelihood function has been approximated through importance sampling. Kim, Shephard and Chib [44] compare continuous SV models with ARCH models and with GARCH t-Student model. They provide also an analysis of MCMC method for parameters inference and volatility filtering when applied to an approximated likelihood function. In particular they linearized the measurement equation by taking the logarithm of the square and by approximating the resulting innovation distribution with a mixture of distribution. The same approximation technique is used in So, Lam and Li [59]. They generalize the usual continuous SV model by introducing a Markov jump process in the volatility mean. Through this switching process the model accounts for both persistence effects and tilts in volatility. They adopt MCMC approach with a data augmentation principle and take into account the works of Harvey, Ruiz and Shephard [38] and of De Jong and Shephard [20]. Recently, Chib, Nardari and Shephard [13] introduce GSV models, with Studentt innovations and with a jump process in the mean of the measurement equation. They use a MCMC approach for estimating parameters and Particle Filter for approximating the likelihood function in order to perform model diagnostic. Many recent papers focus on the use of MCMC methods in financial models estimation. Johannes and Polson [41] review financial applications of MCMC methods. They discretize the continuous time diffusion process and apply MCMC for parameters estimation and hidden state filtering. Particle filter are then used for model diagnostic. Eraker [24] follows the same framework. See Johannes, Polson and Stroud [42] for a Bayesian approach to state filtering and parameter estimation to jump and diffusion stochastic processes.
In this work, we follow a third Bayesian approach, which has been recently developed and which reveals efficient for general dynamic models. This is sequential simulation based filtering, called Particle Filter,which is particularly useful in financial applications, when processing data sequentially. As a new observation becomes available, the hidden states and the parameters of the dynamic model can be updated and a new prediction can be performed. Particle filter allows also to perform model diagnostic and parameter inference. For a review of the state of the art see Doucet, Freitas and Gordon [22]. Pitt and Shephard [52] improve standard Sequential Importance Sampling filtering techniques by introducing the Auxiliary Particle Filter (APF). They apply APF to stochastic volatility models and find that the method performs better than other simulation based techniques and that it is particularly sensitive to outliers. Kim, Shephard and Chib [44] and Chib, Nardari and Shephard [13] apply particle filter for stochastic volatility extraction but not for parameter estimation. Polson, Stroud and Müller [54] apply a practical filter for sequential parameter estimation and state filtering. They show the superiority of their method when compared to the APF with the sequential parameter learning algorithm due to Storvik [60]. Lopes and Marino [48] and Lopes [47] apply APF to a MSSV model for sequential parameter learning and state filtering.
The first aim of our work is to develop the idea of Chib, Nardari and Shephard [13], which propose to extend their jump GSV model by introducing a Markov jump process in the volatility.
The second aim is to develop the joint estimation of the states and the parameters of Markov switching SV model. Recently Storvik [60] analyses this problem and reviews main approaches in the literature. Our work refers to the algorithm of Liu and West [46]. They suggest to combine the APF algorithm with the kernel reconstruction of the parameters posterior distribution. Sequential filtering techniques introduce approximation errors in estimation of the states and parameters. Moreover these errors cumulate over time. Thus, for financial applications of the dynamic Bayesian models and of the particle filtering, it is necessary to take into account and to correct approximation errors.
The work is structured as follows. In section 2 we state the SV models, discuss some useful reparameterisations and provide stationarity condition for the MSSV. Section 3 focuses on the particle filter for the joint estimation of states and parameters. Section 4 presents some simulation results. Section 5 concludes.

The Markov Switching Stochastic Volatility Models
Financial time series are often characterised by heavy tails, asymmetry and time varying volatility. In particular they may exhibit jumps in volatility, volatility persistence effects, also called volatility clustering and leverage effects. In this work we focus on the joint modelling of heavy tails of the observable process and on the clustering effects in volatility dynamic.
The hypothesis of Gaussian evolution of the observable process seems to be quite restrictive in many financial applications. Thus some authors proposed generalised stochastic volatility models (see Harvey, Ruiz and Shephard [38], Shephard and Pitt [57] and Chib, Nardari and Shephard [13]). In our work we consider MSSV heavy tails processes and make a comparison with the Gaussian model.
Another aspect of interest is volatility clustering. It is possible to capture volatility persistence by introducing a jump component in the volatility dynamic. So, Lam and Li [59] extend the simple continuous volatility model of Taylor [62], by adding a Markov jump process to the drift of the stochastic volatility. Following them, in our Markov switching stochastic volatility model, we assume that the log-volatility, h t , is a continuous Markov process, conditionally to a discrete homogeneous Markov process, s t . This process is called switching process and determines the regime of volatility. Moreover we assume the switching process varies in a finite and known set of states. See Chopin [14] for an application of particle filters to switching models with a varying number of states. In the following we give some examples of MSSV models under different assumptions on the distribution of the observable process. We will consider both a Gaussian innovations process and heavy tail processes like Student-t and α-stable innovations processes, with unknown degrees of freedom and unknown characteristic exponent respectively.

The Gaussian MSSV Model
The assumption of Gaussian innovations is quite common in practice, thus in this section, we define a basic MSSV model (M 1 ), which is completely Gaussian The left upper graph exhibits the evolution of the hidden jump process, the right upper graph shows the log-volatility of the observable process, which is represented in the third graph.

Heavy tail MSSV Models
Due to the high degree of heterogeneity of the time series, the assumption of Gaussian observable process seems to be restrictive in many real contexts and for this reason it has been removed by many recent studies. Moreover a common way to model heterogeneous dynamics in time series is to include a stochastic latent structure in the model. For example Chib, Nardari and Shephard [13] propose a Student-t discrete time GSV model and a similar model with a jump component in the mean of the observable process. In a continuous time setting Barndorff-Nielsen and Shephard [6] study heavy tail processes.
Financial time series often exhibit volatility tilts and clustering behaviour. In order to capture these features of the volatility dynamic, we study the following non-Gaussian Markov switching stochastic volatility models. We assume that the observable variable follows a heavy tail process, which will alternatively be a Student-t process or a α-stable process. Note that both of them have the Gaussian model as particular case.
The first GSV model (M 2 ), is for t = 1, . . . , T , where M L is the multinomial distribution and T ν (y|δ, σ) represents the density of a Student-t distribution The distribution is characterised by three parameters: ν the degrees of freedom parameter, δ the location parameter and σ the scale parameter. Note that the heaviness of the tails is controlled by the parameter ν and that when ν → ∞ the distribution converges to a Gaussian distribution. The second GSV model (M 3 ) also is characterised by an heavy tail observable process for t = 1, . . . , T , where M L is the multinomial distribution and S α (y|β, δ, σ) represents the density of a stable distribution, which is completely characterised by the following four parameters: the  Fig. 1 characteristic exponent α, the skewness parameter β, the location parameter δ and finally the scale parameter σ. We assume for simplicity that β = 0. Moreover we take α ∈ (1, 2] in order to have a finite first order moment. Note that stable distributions have the Gaussian distribution as a particular case, when α = 2. The stable distribution density can not generally be written in an analytic form, thus it is conveniently defined through its characteristic function. The most well known parametrisation is defined in Samorodnitsky and Taqqu [55] where ϑ ∈ R. In the parameter setting of our model the characteristic function reduces to In order to obtain an analytic representation of the density of a stable random variable an auxiliary variable has to be introduced. The same strategy is used in Buckle [10] for α-stable distributions and in Godsill [32] for inference on time series with α-stable innovations. For the sake of simplicity we introduce the following notations. The parameter vector is θ = (ν, (α 1 , . . . , α L ), φ, σ 2 , (p 1 , . . . , p L )) for the model M 2 and θ = (α, (α 1 , . . . , α L ), φ, σ 2 , (p 1 , . . . , p L )) for the model M 3 . In order to constrain the parameter α to be into (1,2] we consider the following invertible transformation: log((α − 1)/(2 − α)). For the Student-t distribution we let ν vary uniformly in the interval [2,100], thus we use the transformation: log(((ν − 2)/98)/(1 + ((ν − 2)/98))).

Stationarity Conditions for MSSV models
The standard continuous SV process is often assumed in order to model the excess of kurtosis in the unconditional distribution of the observable process. Moreover continuous SV is able to capture volatility clustering, but many financial time series exhibit also a multi-modal unconditional distribution. This feature can be explained by a volatility process with a mean changing over time. In MSSV models a hidden jump process (i.e. Markov Switching process) is added to the mean parameter of the log-volatility process. A first consequence of including a hidden Markov Switching process in the log-volatility is to increase furthermore the degree of kurtosis of the observable process. Moreover the MSSV model is able to capture both volatility persistence and volatility tilts.
Financial applications of MSSV model make sense if stationarity of the model is guaranteed. Thus some considerations on the stationarity are needed. Define the following reparameterisation of the MSSV model with p ij ≤ 0, ∀ i, j ∈ E and L l=1 p il ≤ 1. Moreover {s t } t∈N is a Markov jump process, which takes value in the finite countable state space E = {0, . . . , L}. In the following we assume that E = {0, 1}, the initial state s 0 of the process has probability measure µ 0 and finally s t has transition matrix P. Note that through the transition matrix and the initial probability measure, the Markov jump process is well defined.
As stated in Theorem 2 (Appendix A), the second order stationarity of the process ln(y 2 t ) is guaranteed by the second order stationarity of the process h t . In the following we focus on the stationarity conditions for the hidden Markov process {h t , s t } t∈N . Due to the causality relations between s t and h t , it is possible to study first the unconditional stationarity of {s t } t∈N and secondly the stationarity of {h t } t∈N conditionally on {s t } t∈N . Stationarity conditions for {s t } t∈N follow from the properties of the n-times composition of the transition matrix. When n → +∞, the transition probability P n tends to a finite quantity if and only if |1 − p 10 − p 01 | < 1 and by Theorem 2 conclude that these are sufficient condition for the second order stationarity of ln(y 2 t ). The autoregressive structure of the log-volatility process (see Equation (14)) makes it dependent on the past history of the Markov jump process. This feature becomes evident from the ergodic solution of the system of stochastic difference equation (14), (15) and (16) which is derived in Appendix A, under the assumption |φ| < 1.
In Appendix A, we find that first and second order stationary moments of h t exist if |φ| < 1 and |1 − p 10 − p 01 | < 1.
Further details on the asymptotic second order stationarity and strictly stationarity for switching non linear AR and switching ARMA models see also Francq and Roussignol [26] and Francq and Zakoian [27].

Particle Filters
Particle filters, also referred in the literature as Bootstrap filters, Interacting particle filters, Condensation algorithms, Monte Carlo filters, are sequential Monte Carlo algorithms. They reveal quite useful for filtering in dynamic models, like M 1 , M 2 and M 3 , which have elements of non-linearity and non-Gaussianity and provide a significant advantage over traditional filtering techniques. In particular, in many real situations data are processed on-line. When a new observation arrives, the estimate of the states and of the parameters has to be updated. Thus recursive techniques, like sequential Monte Carlo filters, are well appreciated. Moreover simulation based filtering allows to evaluate the likelihood function of complex dynamic models and allows also to perform model diagnostics.
In the following we focus on the joint estimation of states and parameters of the dynamic model. We state a quite general formulation of the filtering problem in a Bayesian perspective, which does not usually admit an analytical solution. Denote by {x t ; t ∈ N}, x t ∈ X , the hidden states of the system, by {y t ; t ∈ N 0 }, y t ∈ Y the observable variable and by θ ∈ Θ the parameters of the densities. In this section we suppose that parameters are known. The Bayesian state space representation of a nonlinear, non-Gaussian dynamic model, is given by an initial distribution p(x 0 ), a measurement density p(y t |x t ) and a transition density p(x t |x t−1 ; θ). Moreover, we assume that the Bayesian dynamic model is Markovian, that is the transition density depends on the past, only through the last value of the hidden state. The measurement density is a function of the current value of the hidden state. Fig. 4 shows the causality structure of the Bayesian dynamic model given in equations (18), (19) and (20). Note that models M 1 , M 2 and M 3 do exhibit this structure. When processing data on-line, at each time t, two quantities of interest are the estimate of the current hidden state of the system and the prediction on the state of the system at time t + 1. In order to predict the future value of the state of the system, given the information available at time t, we use the Chapman-Kolmogorov equation, which characterises the hidden state evolution and gives us the following prediction density Figure 4: Causality structure of a Markovian dynamic model with hidden states. A box around the variable indicates the variable is known, while a circle indicates a hidden variable.
Moreover the assumption of Markovian dynamic of the hidden states allows to obtain a recursive relation, which is useful when solving a filtering problem and sequentially processing data at the same time In the following we introduce some basic particle filter algorithms, with a particular attention to the auxiliary particle filter. Moreover we treat the problem of the joint estimation of the hidden states and of the parameters of the model.

State Filtering
Assume the parameters θ of the dynamic model given in equations (18), (19) and (20) are known. Different versions of the particle filter exist in the literature and different simulation approaches like rejection sampling, MCMC and importance sampling, can be used for the construction of a particle filter. To introduce particle filters, we will apply importance sampling reasoning to the smoothing problem. At each time step t + 1, as a new observation y t+1 arrives, we are interested in predicting and filtering the hidden variables and the parameters of a general dynamic model. In particular we search how to approximate prediction an filtering densities given in Equations (21) and (22) by means of sequential Monte Carlo methods.
Assume that the weighted sample {x i t , w i t } N i=1 has been drawn from the filtering density at time tp Each simulated value x i t is called particle and the particles set, can be viewed as a random discretization of the state space X , with associated probabilities weights w i t . It is possible to approximate, by means of this particle set, the prediction density given in Eq. (22) as follows which is called empirical prediction density and is denoted byp(x t+1 |y 1:t , θ). By applying the Chapman-Kolmogorov equation it is also possible to obtain an approximation of the filtering density given in Eq. (22) p(x t+1 |y 1: which is called empirical filtering density and is denoted byp(x t+1 |y 1:t+1 ; θ). Assume now that the quantity E(f (x t+1 )|y 1:t+1 ) is of interest. It can be evaluated numerically by a Monte Carlo sample A simple way to obtain a weighted sample from the filtering density at time t + 1 is to apply importance sampling to the empirical filtering density given in equation (27). This step corresponds to propagate the initial particle set through the importance density q(x t+1 |x i t , y t+1 ; θ). Moreover if we propagate each particle of the set through the transition density p(x t |x i t−1 ; θ) of the dynamic model, then particle weights § update as follows This is the natural choice for the importance density, because the transition density represents a sort of prior at time t for the state x t+1 . However, as underlined in Pitt and Shephard [52] this strategy is sensitive to outliers. See also Crisan and Doucet [16], for a discussion on the choice of the importance densities. They focused on the properties of the importance density, which are necessary for the a.s. convergence of the sequential Monte Carlo algorithm.
The generic particle filter developed through previous equations is called Sequential Importance Sampling (SIS). See also Doucet, Freitas and Gordon [22] for an updated review on the particle filter techniques and on the main convergence results for this kind of algorithms. It is well known in the literature (see for example Arulampalam, Maskell, Gordon and Clapp [4]), that basic SIS algorithms have a degeneracy problem. After some iterations the empirical distribution degenerates into a single particle, because the variance of the importance weights is non-decreasing over time (see Doucet [21]). In order to solve the degeneracy problem, the Sampling Importance Resampling (SIR) algorithm has been introduced by Gordon, Salmond and Smith [33]. This algorithm belongs to a wider class of bootstrap filters, which use a re-sampling step to generate a new set of particles with uniform weights. This step introduces diversity in particle set, avoiding degeneracy. Note however that the basic SIR algorithm produces a progressive impoverishment of the information contained in the particle set, because of the resampling step and of the fact that particles does not change over filter iterations. Many solutions have been proposed in literature. We recall here the Regularised Particle Filter proposed by Musso, Oudjane and LeGland [50], which is based on a discretisation of the continuous state space. Moreover Gilks and Berzuini [8] propose the SIR-Move algorithm, which moves particles after the re-sampling step. Thus particles value changes and impoverishment is partially avoided. Finally Pitt and Shephard [52] introduce the Auxiliary Particle Filter (APF) and applied it to a Gaussian ARCH-type stochastic volatility model. They find the filter works well, although it is highly sensible to outliers. In the following we focus on the APF algorithm.
In order to avoid re-sampling, APF algorithm uses an auxiliary variable to select most § Note that importance sampling requires to know the importance and the target distributions up to a proportionality constant, thus the unnormalized weights may not sum to one. However normalized importance sampling weights can be easily obtained as follows The normalization procedure causes the loss of the unbiasness property.
representative particles and to mutate them through a simulation step. Then weights of the regenerated particles are updated through an importance sampling argument. In this way particles with low probability do not survive to the selection and the information contained in particles set is not wasted. In particular the auxiliary variable is a random particle index, which is used in the selection step to sample new particles. The random index is simulated from a distribution which contains and resumes the information on previous particle set. This feature is due to the use of µ i t in the measurement density. Note that the empirical filtering density given in Eq. (27) is a mixture of distributions, which can be reparameterised by introducing the allocation variable i ∈ {1, . . . , N }. The joint distribution of the hidden state and the index i is The basic idea of the APF is to refresh the particle set while reducing the loss of information due to this operation. Thus the algorithm generates a new set of particles by jointly simulating the particle index i (selection step) and the selected particle value x t+1 (mutation step) from the reparameterised empirical filtering density and according to the following importance density for j = 1, . . . , N . By following the usual importance sampling argument, the updating relation for the particle weights is .
In many applications of the particle filter techniques, parameters are treated as known and MCMC parameter estimates are used instead of the true values. MCMC is typically a off-line approach, it does not allow to sequentially update parameter estimates as new observations arrive. Moreover, when applied sequentially, MCMC estimation method is more time consuming than particle filter algorithms. Thus in the next section we will consider the filtering problem in presence of unknown static parameters, in a Bayesian perspective.

Parameter Estimation
When processing sequentially data, both the problems of hidden state filtering and of the parameters estimation arise. In engineering, a common way to solve this problem is to treat parameters as hidden states of the system. Berzuini et al. [9] develop this approach in a Bayesian framework. Thus standard particle filtering techniques apply here to estimate the joint posterior density p(x 0:t , θ|y 1:t ). Approximated posterior p(θ|y 0:t ) is then obtained by marginalisation.
Observe that the parameters are fixed over time, thus particles relative to the parameter posterior distribution do not change, while the particles approximating hidden states are allowed to vary over filter iterations. As pointed out by Storvik [60], the degeneracy of the parameters weights produces a negative effect on the whole posterior distribution, which degenerates to a Dirac mass. Different solutions to the degeneracy problem have been proposed in the literature. For example Kitagawa [43] explicitly assumes an artificial evolution of the parameters, which are still considered as hidden states of the dynamic model. The assumption of time varying parameters introduces diversity in particles set avoiding the degeneracy problem, but produces higher variability in parameter estimates. Liu and West [46] use a kernel density estimation of the parameter posterior distribution as importance density to refresh the particle set. This method produces slowly time-varying parameters and thus adds noise to the parameter estimates. In order to reduce the effect of the artificial variability, the authors adopt a kernel shrinkage technique.
An alternative approach can be founded in Storvik [60], which proposes a quite general particle filter for joint estimation of hidden states and non-dynamic parameters. The algorithm requires to know a set of sufficient statistics for the posterior distribution. Note however that the existence of sufficient statistic for the parameter θ is not necessary in principle, because the posterior distribution of the parameters p(θ|x 0:t , y 0:t ) can be always evaluated at each time step. A sequential algorithm, called practical filter, is proposed by Polson, Stroud and Müller [53]. The parameter and state joint filtering distribution is represented as a mixture of fixed lag-filtering distributions. They simulate from the joint filtering distribution by simulating sequentially from the parameter posterior and from the fixed-lag smoothing distribution. The method is particularly useful when a set of sufficient statistic for the posterior is known. A comparison (see Polson, Stroud and Müller [54]) with Storvik's [60] algorithm proves the superiority of the practical filter when apply to the basic continuous SV model.
Sequential methods, alternative to particle filters, can be founded in Andrieu and Doucet [3], who propose online Expectation-Maximization type algorithms, which do not degenerate, but require the knowledge of the hidden Markov process ergodic distribution and of a set of sufficient statistics for the posterior distribution.
In the following we refer to the algorithm due to Liu and West [46] and to the works of Lopes [47] and Lopes and Marigno [48], for some applications of the particle filter algorithms to MSSV models.
The posterior distribution is written as the product of two components. The first is the filtering distribution and the second is the full posterior distribution of the parameters given hidden states and the observations. The completed posterior of the parameters is proportional to a function which can always be written in a recursive form that can be evaluated in the simulated hidden states as a by product of the particle filter algorithm.
The filtering problem can thus be treated conditionally to the parameters value. It is possible for example to use the Kalman Filter or the HMM filtering algorithms to filter the states and the particle filter to estimate the parameters (see for example Chopin [14]). In MSSV model both the Kalman Filter and the HMM can not be used, thus Monte Carlo filters must be used for the joint estimation of parameters and states of the dynamic system. However, in a full simulation based approach, treating the parameters as fixed causes the degeneracy of the filter. To solve this problem Liu and West [46] propose to approximate the posterior distribution p(θ|y 1:t ) with a particle set {x i t , θ i t , w i t } and to reconstruct the parameter posterior distribution at time (t + 1) through a Gaussian kernel density estimation In this context, index t for parameters means that they are updated sequentially. Note that after particle approximation, another approximation has been introduced. The kernel reconstruction of the posterior, implies the substitution of the parameter transition density, δ θ i t (θ t+1 ), by a Gaussian transition density N (θ t+1 |m i t , b V t ). After the kernel reconstruction of the posterior density, a new set of particles can be generated by applying the APF algorithm to the states and to the parameters using the kernel posterior density estimate as parameters importance density. The reconstruction of the posterior distribution through Gaussian kernel density estimation is a technique introduced by West [64], [65] in order to obtain an Adaptive Importance Sampling algorithm. The use of an adapting importance function is particulary useful in the dynamic models, where the probability density function of the system can change over time. Note that the posterior distribution is a mixture of distributions, that can be reparameterised, using an allocation variable i to indicate the mixture component The main idea of APF applies here and the particle selection step is obtained by sampling the mixture index i together with states x t+1 and parameters θ t+1 . Sampling from the joint density (42) is obtained through importance sampling with proposal density where the instrumental density, used to sample the random index, is q(i|y 1:t+1 ) = p(y t+1 |µ i t+1 , m i t )w i t . From previous assumptions on the proposal distribution, the weights updating equation is .
with j = 1, . . . , N . The algorithm avoids degeneracy by introducing diversity in particles. It is known that diversity produces the impoverishment of the information contained in particles. Thus Liu and West [46] propose a kernel shrinkage technique in order to reduce the effect of the artificial variability. The kernel density at time t + 1 depends on the density at time t through the constraint on the conditional variance: It results that each component of the kernel density estimation of the posterior distribution is not centered on the particles, θ i t , but on the linear combination between particles and the empirical average of the particles value at the previous step In Appendix B, we give a proof of the kernel shrinkage relations given in equation (45), using standard theorems on the conditional normal distribution. The resulting APF for states and parameters estimation is in Algorithm 1.

Algorithm 1 (see Liu and West [46])
Given the initial set of particles {x j t , θ j t , w j t } N j=1 , for j = 1, . . . , N .
In Appendix C we give a proof of the weights updating relation. Although this filtering approach does not explicitly assume that parameters vary over time, the dynamic nature of the parameters results implicitly from the structure of the filtering algorithm. It is possible to show (see Appendix C), that the proposed filtering approach assumes time varying parameters with a Gaussian transition density. Note however that the particle filter algorithm uses an approximation of the parameter posterior distribution and maintains this approximation both in the importance density and also in the weight updating relation. In principle an exact weight updating relation must be determined and the approximation errors must be taken into account, before they accumulate and produce poor parameter estimates. Thus a weight correction step would be needed, which can be considered a variant of the Rao-Blackwellization argument (Casella and Robert [11]). In particular, consider the true parameter posterior distribution and look at the kernel density approximation as a way to obtain an adapting importance function, then the exact weights updating can be determined as follows where the parameter posterior distribution is known from relation (39) and can be approximated through particle filter This approximated weight updating is computationally unfeasible because of the high number of times (t × N ) the transition and the measurement equations must be evaluated. In the next section we propose APF algorithm for generalised MSSV models.

Particle Filter Algorithms for Generalised MSSV Models
The general algorithm exhibited in the previous section applies both to the Gaussian model M 1 and to heavy tail models M 2 and M 3 . Lopes [47] gives a version of the algorithm for the gaussian model M 1 . In the following we exhibit the algorithm for a Student-t model. Remember that θ = (α 1 , α 2 , φ, ν, p 12 , p 22 , σ 2 ), then APF is in Algorithm 2.
Algorithm 2 (APF for Student-t MSSV model) Given an initial set of particles For j = 1, . . . , N , update the following variables: (a)s j t+1 = arg max l∈1,...,k 3. For j = 1, . . . , N : Note that the model M 2 is more difficult to estimate because the degrees of freedom ν determine the tail heaviness of the observable process. This element makes the weight updating relation more sensitive to the evolution of the parameters. For the model M 3 we propose the following adaptation of the algorithm of Liu and West [46]. In order to obtain an integral representation of the α-stable density, we introduce an auxiliary (or completing) variable z t . Then we suggest to approximate the integral by simulating z t from its conditional distribution.
For j = 1, . . . , N , update the following variables: (a)s j t+1 = arg max l∈1,...,k Note however that the numerical approximation of the stable density introduces further errors in the algorithm and the parameter estimation becomes difficult.

Convergence of the Particle Filter Algorithms
If we assume the parameter vector is a stochastic process with a Markovian transition kernel, then the particle filters developed for joint state filtering and parameter estimation converge a.s.. In fact, the dynamic models and particle filters studied in previous sections, satisfy the assumptions required for the a.s. convergence of the empirical posterior density to the true posterior p(x 0:t , θ 0:t |y 1:t ) a.s.
→ p(x 0:t , θ 0:t |y 1:t ) The necessary assumptions for the a.s. convergence of quite general sequential Monte Carlo algorithms are in Crisan and Doucet [16]. The proof of these results are based on the convergence analysis of empirical densities, produced by Crisan [15]. See also Crisan and Doucet [17] for a useful survey on the convergence results on particle filters.

Simulation Study
In the following we verify the efficiency of the Auxiliary Particle Filter algorithm exhibited in the previous section through some applications on synthetic data. Tab. 1 shows the effect of the number of particles on the parameter estimates. An higher number of particles improves the precision of the estimates, overall for the parameters α 1 , α 2 and φ. For all the experiments we use as prior a multivariate Gaussian distribution centered near the true parameters value. We try other initial values and find that for quite all starting value the APF estimates are close to the true parameters value. The result is not robust with respect all parameter setting. In that case the choice of the parameter δ related to the kernel shrinkage becomes important. There is a tradeoff between the high level of artificial noise (controlled by the parameter δ), which allows to explore the parameters space and the efficiency of the parameter estimates. Tab. 1 gives the result for the Gaussian model M 1 , on a sample of T = 1, 000 observations and with a M = 5, 000 constant size particle set. The filtered hidden states are represented in Fig. 5 and the evolution of the particle weights is in Fig. 7. The absence of degeneracy has been verified by estimating both the survival rate and the effective sample size indicator (see Fig. 6). Survival rate measures the fraction of particles survived to the selection step with respect to the total number of particles in the set. The survival rate reveals particle degeneracy when exhibiting a persistent high number of dead particles from a generation to the subsequent one. We compute survival rate as follow where I i,t = {j ∈ {1, . . . , N }|i j t = i} is the set of all random index values, which are selecting, at time t, the i − th particle. Note that if at time t the particle k does not survive to the selection step then the set I k,t becomes empty. Fig. 6 exhibits the evolution over time of the survival rate for a set of N = 5, 000 particles. Although for some filter iterations the rate falls under the 30% level, it does not remain persistently under that level. We can conclude that the filter does not show degeneracy problems. In order to complete the degeneracy analysis we evaluate the Effective Sample Size indicator. This degeneracy measure has been introduced by Liu and Chen [45] and for the general dynamic system of equations (18)- (20) is defined as where the weights, w * i k = p(x i k |y 1:k ; θ)/q(x i k |x i k−1 ; θ), cannot be calculated explicitly. Thus the following estimator has been usedÊ where normalized weights,w i t , have been defined in equation (29). Observe that this degeneracy measure is less than or equal to N . It is equal to N when the importance function is exactly equal to the filtering density and tends to zero when the variance of the importance weights tends to infinity, this is when particle filter degenerates. Fig. 6 shows the estimated ESS relative to the particle filter applied to the gaussian model. Observe that the effective sample size varies over time, but it never stabilizes at zero. Thus we conclude again in favour of a non-degeneracy of our particle filter for the gaussian model.
We apply particle filter to estimate the Student-t model M 2 . Estimation results for an increasing number of particles are represented in Tab. 2.  . 8 exhibits both the filtered hidden jump Markov process and the filtered stochastic logvolatility. The absence of degeneracy has been detected through the survival rate, which is represented in Fig. 9. The same figure gives the evolution over filter iterations of the estimated parameter, ν, which give the heaviness of the tail of the distribution. We conclude that the algorithm need of an higher number of particles to produce better parameter estimates. Moreover the results obtained for both the Gaussian and the Student-t models need further evaluation studies. In particular the sensitivity of the parameter estimates to the value of the transition probabilities p 11 and p 00 need to be studied.

Conclusion
Following some suggestions present in the literature on the SV models, in this work we develop heavy tail Markov Switching Stochastic Volatility (MSSV) models. We discuss stationary conditions of MSSV models and in order to make inference we follow a recent literature on the simulation based approach. In particular we focus on the parameters and states sequential learning problem. We show that estimation errors are due to the approximation errors, which occur when simultaneously applying auxiliary particle filter and adaptive posterior reconstruction and suggest a theoretical remedy. Moreover we assume time varying parameters and apply the auxiliary particle filter algorithm to heavy tails MSSV model and verify, through some simulation studies, the efficiency of such algorithm on Student-t innovations model.

Appendix A -Stationarity Conditions
In this appendix we derive some stationarity conditions for the stochastic volatility process with jumps, given in equations (13)- (16). Note that through the transition matrix and the initial probability measure, the Markov jump process is well defined on the canonical space See Baldi et al. [5] for further details. Stationarity analysis of the MSSV model is also based on the following property of the transition matrix.

Theorem 1 (Transition matrix composition)
Given the transition matrix P, the n-time composition is denoted by P n = n−times P • P • . . . • P and is defined by the following equation: A sufficient condition for the stationarity of the observable process with stochastic volatility is the stationarity of the hidden stochastic log-volatility process, as stated in the following.
Theorem 2 (Second order stationarity conditions) Given the MSSV process defined in equations (13)- (16), if the innovation process ε t is stationary and the hidden process h t is second order stationary then the process log(y 2 t ) is second order stationary.
Proof Consider the logarithmic transformation of y 2 t and the independence assumption between h t and ε t , then by the Jensen inequality are finite when t → ∞, then previous quantities are finite.
In the following we discuss stationarity of the first and second order moments of the hidden switching log-volatility process.
Observe that the autoregressive structure of the log-volatility process, see the equation (14), makes it dependent on the past history of the Markov jump process. This feature becomes evident after some recursive substitutions The system of stochastic difference equations (14), (15) and (16) admits an ergodic solution. In particular it is possible to find the ergodic solution for the process h t .

Theorem 3 (Ergodic solution)
Assume that h 0 = 0 and |φ| < 1, then the system of equations (14), (15) and (16), has the following ergodic solution h t Proof Consider the process h t and suppose it is a solution of the system (14)- (16), then we show that it is still a solution of the system at time t + 1 We evaluate the asymptotic stationarity of the ergodic solution by calculating the moments of the process and by taking. Take the expectation of the process defined in (54) with respect to the ergodic probability π and consider the limit when t → +∞ where the expected value of the jump process is calculated with respect to the ergodic probability E π (s t−i ) = 0 π 0 + 1 π 1 = p 01 /(p 01 + p 10 ).
In order to evaluate the second order asymptotic stationarity of the log-volatility process, consider the variance of the process under the ergodic probability and take the limit when t → +∞ Under the assumption that |φ| < 1, the first and third terms of the sum have finite limits and reduce respectively to and The covariance term becomes = lim t→+∞ 2β 2 p 01 p 10 (p 10 + p 01 ) 2 The last equation has been obtained under the following stationarity conditions: |φ| < 1 and |φ(1 − p 01 − p 10 )| < 1. The first condition is required for the stationarity of the variance term.
The second condition is satisfied due to the existence of the ergodic probability of the jump process. Note that the auto-covariance of the jump Markov process has been calculated through the equation (53) Cov with i ≤ j. Finally we check the stationarity of the autocovariance function of the process. Assume that τ ≤ t − 1, then Previous quantity depends on t, thus we process is not second order stationary. Moreover the limit when t → +∞ is finite and depends only on τ , under the assumption that |φ(1 − p 01 − p 10 )| < 1 = p 01 p 10 p 01 + p 10 It is possible to prove that the covariance is finite also in the case τ > t − 1. After previous considerations, we conclude that the jump log-volatility process is asymptotically stationary of second order.
Finally we show that the second order stationarity conditions obtained in previous sections are necessary conditions for the existence and uniqueness of the ergodic distribution of the hidden Markov process {h t , s t } t∈N . On the stationarity conditions of a Markov switching functional autoregressive process, the only available results are due to Francq and Roussognol [26]. Francq and Zakoian [27] analyse stationarity conditions of a Markov-switching multivariate autoregressive moving average process.
In the following we will refer mainly to the work of Francq and Roussignol [26]. Introduce the following multivariate functional autoregressive process with values in R S where {η t } t∈N is a sequence of i.i.d. random processes, θ ∈ Θ the parameters of the model and (ii) For all i ∈ E the random vector G(η t , i) has density f i (·) with respect to the Lebesgue measure of R S and E(||G(η t , i)||) < ∞ where || · || is the Euclidean norm; (iii) There exist a 1 , a 2 , . . . , a L such that ∀ (x, y) ∈ R S , ||F (x, i) − F (y, i)|| ≤ a i ||x − y|| and such that the matrix Q =       p 1,1 a 1 p 2,1 a 1 · · · p L,1 a 1 p 1,2 a 2 p 2,1 a 2 · · · p L,2 a 2 . . . . . . . . . . . .
has spectral radius strictly less than 1; are satisfied. Then The Markov chain defined by Eq. (67) admits a unique invariant probability µ. The second marginal of µ is equal to the invariant probability of {s t } t∈N . A stationarity Markov process {h t , s t } t∈N satisfying (67) with µ as initial distribution is an aperiodic ergodic Harris process. Moreover, for all process {h t , s t } t∈N satisfying (67) and all µ-integrable function g from R S × E to R we have Proof For a proof see Francq and Roussignol [26] The theorem applies to the hidden log-volatility process. In particular the assumption (ii) is satisfied because the random variable G(η t , s) has normal density with mean zero and finite variance σ η . The third assumption is also satisfied because ||F (x, s) − F (y, s)|| = ||α + βs + φy − (α + βs + φx)|| = |φ| ||y − x||

Appendix B -States and Parameters Joint Estimation
In the following we show some analytical aspects of the joint estimation of the parameters and the states of the Bayesian dynamic model given in equations (34), (35), (36) and (37). We use the following notation for the conditional moments V t (·) = V ar(·|y 1:t ), C t (·, ·) = Cov(·, ·|y 1:t ) and E t (·) = E(·|y 1:t ). Denote with I the identity matrix. Assume that parameters evolve over time Note that the noise component produces artificial variability in the posterior distribution of the parameters. In order to reduce the variability Liu and West [46] suggest to impose the following constraint on the variance-covariance matrix of the parameter V t (θ t+1 ) = V t (θ t ) = V t . It follows that In order to control the transition of the parameters between time t and (t+1) they use a technique of shrinkage between gaussian kernels. The resulting parameters transition density is a Gaussian. The shrinkage technique has already been used by West [65] in order to reconstruct the posterior distribution in an adaptive importance sampling scheme. In the following we prove the result given in Eq. (45).

Proof (Kernel Shrinkage Realtion)
The joint density of θ t+1 and θ t is a Gaussian density, characterised by the following moments and by straightforward calculations, the distribution of θ t+1 conditional to θ t is Gaussian, with following conditional mean and variance Conclude that In order to simplify the estimation problem Liu and West [46] assume that the variance-covariance matrix of the noise is proportional to V t and to a discount factor δ Thus previous quantities become: A t+1 = I 3δ−1 2δ , V t (θ t+1 |θ t ) = (1 − ( 3δ−1 2δ ) 2 ) and E t (θ t+1 |θ t ) = 3δ−1 2δ θ t + ( 1−δ 2δ )θ t . Denote a = 3δ−1 2δ , then the distribution in equation (80) simplifies to: In SIS particle filter, the new set of particles {x i t+1 , θ i t+1 , w i t+1 } N i=1 is generated by simulating each pair {x i t+1 , θ i t+1 } from the instrumental density q(x t+1 , θ t+1 |y 1:t+1 ). The weights updating equation is determined by an importance sampling argument. Choose the instrumental density to be the product of the priors of θ t+1 and x t+1 : q(x t+1 , θ t+1 |y 1:t+1 ) = p(x t+1 |x t , θ t+1 )p(θ t+1 |θ t ) then the weights updating equation is given by the following correction step = w i t p(y t+1 |x i t+1 , θ i t+1 ). Auxiliary Particle Filter can be derived from the basic SIS algorithm, exhibited in the previous appendix. APF uses the auxiliary variable j to select randomly particles and to mutate selected particles. The auxiliary variable is simulated from a distribution, which summarizes and conserves the information contained in previous particle set. This feature is obtained also by using the variable µ t . In this way the re-sampling step does not cause the impoverishment of the information contained in the actual particle set.
Observe that the joint transition density is decomposed in the product of the state transition density conditional to the parameters and the parameters transition density. Liu and West [46] use a Gaussian kernel shrinkage technique, which provides more stable estimates. The resulting parameter transition density is the Gaussian distribution exhibited in equations (80) and (82). Assume to have, at time t, a set of particles {x j contained in the particle set {x j t , θ j t , w j t } N j=1 , and m t = E t (θ t+1 |θ t ) is the mean of the parameters transition density. Given the index i, the new set of particles {x j t+1 , θ j t+1 , w j t+1 } N j=1 is generated by simulating (x j t+1 , θ j t+1 ) from the instrumental density q(x t+1 , θ t+1 |i, y 1:t+1 ). The weights updating equation is determined by an importance sampling argument. Choose the conditional instrumental density to be the product of the priors of θ t+1 and x t+1 , given i, with i = 1, . . . , N q(x t+1 , θ t+1 |i, y 1:t+1 ) = p(x t+1 , θ t+1 |x i t , θ i t ) = p(x t+1 |x i t , θ t+1 )p(θ t+1 |θ i t ) then the weights updating equation is given by the following correction step = p(y t+1 |x j t+1 , θ j t+1 ) p(y t+1 |µ i j t+1 , m i j t ) .