Variational level set methods: from continuous to discrete setting, applications in video segmentation and tracking

In this paper, we investigate some recent active contour models used in image and video segmentation and we transpose them into a discrete form to apply the iterated conditional modes (ICM) algorithm. This work can be seen as an extension of the recent work of T. Chan and B. Song for other functional than the Mumford-Shah/Chan-Vese one. We investigate it for video segmentation and tracking applications.


INTRODUCTION
For many years, variational level set methods are often used to tackle segmentation problems (static/moving objects detection in image/video [4], image segmentation or texture discrimination ( [9])).Solving a segmentation problem is usually achieved by minimizing a given energy functional of the general form Thus the aim is to minimize J with respect to an opened subset Ω of the image domain D. Optimization is performed either with the shape sensitivity analysis techniques introduced in [10] or with the heaviside function techniques relying on an implicit representation of Ω by a function u positive on Ω and negative on its complementary domain, following the Osher-Sethian level set method [8].This leads to a new functional which depends on u In order to compute the Gâteaux derivative of J, the heaviside distribution is approximated by a differentiable function H . Whatever the method that is used, we obtain a partial differential equation discretized with classical numerical schemes.
The aim of this paper is to transpose continuous variational level set methods in a discrete setting.The function u is taken as a binary image taking the values +1 and −1.
The discrete functional that corresponds to J is given by This way of discretizing functionals has been used for the first time in [11] for the minimization of the discrete version of the piecewise constant Mumford and Shah functional [7].In this work, we will suppose k b = µ > 0, then we want to find a region with a boundary smooth enough.Our purpose in this work is to apply the Iterated Conditional Modes (ICM) algorithm for the minimization of discrete functionals arising in segmentation problems and to use the length discretization we have introduced above.First, we will recall the principles of this algorithm.Then, we apply it on video segmentation and tracking models in which k in and k out do not depend on u.In the last section, we apply the ICM to a slightly generalized version of the Mumford-Shah/Chan-Vese functional [4] for a video segmentation application and we discuss about the application to other functionals involving mean and variance of the image.

BAYESIAN FORMULATION, ICM ALGORITHM
In this section, for notation simplicity, we denote a pixel s = (i, j) and the set of pixels S. In an interpretation from a bayesian point of view, the functional J d can be seen as an a posteriori energy conditionnally to the image data I if k in = − log(P(I|u = 1)) and k out = − log(P(I|u = −1)).We assume u being a sampling of a Gibbs random variable U, that is to say where Z = u∈X exp(−µL(u)) where X is the set of all possible configurations u, in our case I is assumed to be a sampling of a random variable I. I|U = +1 (resp.I|U = −1) is assumed to be a Gibbs random variable of energy k in (resp.k out ).
Then the energy E(u|I) = J d (u) can be decomposed in the following way Thus by the minimization of J d , we achieve the classical Maximum A Posteriori (MAP) estimation since it consists to maximize the probability a posteriori P(u|I) ∝ P(I|u)P(u)with respect to u.For MAP estimation, Geman and Geman [6] proposed to use the Simulated Annealing algorithm.Unfortunately, the convergence of this algorithm is very slow and it is often used with a geometrical temperature decreasing instead of the logarithmic theoretical one.Here we use the ICM algorithm introduced by Besag [3], this is a suboptimal procedure, in the sense that it converges only towards a local minimum but it is known for being very fast.The ICM algorithm consists in minimizing E(u(s)|I, u(s ), s = s).By the relations above and since u is a sampling of a markovian random variables (L(u) is the sum of an expression that only involves the values of u in three neighboring sites), we obtain E(u(s)|I, u(s up to an additive constant.Then the ICM is performed by sweeping over all the pixels then by minimizing E(I(s)|v) + E(v|u(s ), s ∈ V(s)) with respect to v, given the values of u at other pixels.We iterate this procedure until convergence.

Region independent descriptors
The three functions k in , k out and k b are called region and boundary "descriptors" by Barlaud et al in [2].In this section, we consider region independent descriptors and a constant boundary descriptor: and k b = µ.Such a model arise in video segmentation and tracking: • For a video segmentation purpose ( [2]), Aubert, Barlaud and Jehan-Besson consider a video sequence taken by a static camera and an extracted background image B and they use k in = α and k out = |B − I t | for each image I t in the video.Thus the aim is to have B and I t well matched outside the moving regions that are searched, α representing a minimal contrast between B and I t over the moving regions.
• For tracking purpose, Aron, Mansouri and Mitiche [1] consider the following problem: given an object associated to a reference region Ω ref in image I t , we want to find its location in the image I t+1 .They use the two matching errors which should be minimal respectively on Ω and its complementary.Then it is natural to take k in = ξ 1 and k out = ξ 2 .
One step of the ICM is equivalent to compute the energy variation when we change the value of u in s from the current value u(s) to the opposite one.Then, in our case of region independent descriptors, this can be very easily computed: The variation of the a priori conditional energy E(u(s)|u(s ), s ∈ V(s)) can also be computed since it involves pixels which are neighbors of s in the sense of V.So the implementation of the ICM is quite straightforward here.
We give some experimental results on the Aubert-Barlaud-Jehan-Besson model in figure 2   initialisations allow to avoid irrelevant local minima.For the tracking model of Mansouri et al., some results are given in figure 3 (the initialization on the first image is performed with a classical snake).

Region dependent descriptors
In the case of region dependent descriptors, the energy variation expression is more complicated than in the region independent descriptors case.But the particular case of the Mumford-Shah/Chan-Vese functional has been studied by Chan and Song in [11].We have then k in (x) = (I(x) − µ 1 ) 2 and k out (x) = (I(x) − µ 2 ) 2 where µ 1 and µ 2 are the mean values of I over {u = +1} and {u = −1}.
When changing u(s) from +1 to −1 (for the change from −1 to +1, the indices are inverted), we get ( [11]) the energy variation (n 1 denotes the number of pixels in the region and µ 1 the mean value of I over this region) A slightly generalized version of the Mumford-Shah/Chan-Vese functional lies on the descriptors k in (x) = log(σ 2 1 ) + ( I(x)−µ1 σ1 ) 2 and k out (x) = log(σ 2 2 ) + ( I(x)−µ2 σ2 ) 2 .It has been used by Deriche and Rousson in [9] for texture segmentation and tracking.From a probability point of view, this model relies on the hypothesis that the image is a mixture of two gaussians with different means and variances.Actually, we can throw away the terms ( I(x)−µi σi and thus the summation of these two terms gives n 1 + n 2 = n tot the total number of pixels in the image which is constant, thus the energy can be expressed very easily by 2 ) + E(u(s)|u(s ), s ∈ V(s)).In comparison, the Chan-Vese energy is given by ).As an application, video segmentation with a given background image is still possible (replace I by |B − I| the difference between the image and the background in the computations above, see [5]).The advantage over the model of Aubert et al is that it does not depend anymore on a parameter α.We display the result on figure 4 for µ = 50.

CONCLUSION
In this paper, we have proved that the Chan and Song algorithm can be seen as a particular case of the well known ICM algorithm.Here we have applied it to other functionals involving either region-dependent or region-independent descriptors.We have shown applications in video segmentation and tracking, but texture segmentation is also possible, since the Deriche-Rousson model was also designed for this purpose.

Fig. 1 .
Fig. 1.One image of the sequence and the background computed by time median filter.

Fig. 2 .
Fig. 2. Aubert, Barlaud and Jehan-Besson video segmentation model by comparison of the images and a given background.The initialization (the first image) is chosen as {|B − I| ≤ 16}.The result with the ICM superimposed to the original image.
and thus the expression of the energy variation for the Rousson-Deriche model is given by log

Fig. 4 .
Fig. 4. Application of the ICM with the Rousson-Deriche model on the difference image |B − I| for µ = 50.On the top of the figure, the initialization of the algorithm: we threshold |B − I| at the level 20; the left image is the mask and the right image is the mask superimposed upon the image.