Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes
Venel, Xavier; Ziliotto, Bruno (2016), Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes, SIAM Journal on Control and Optimization, 54, 4, p. 1983-2008. 10.1137/15M1043340
TypeArticle accepté pour publication ou publié
Journal nameSIAM Journal on Control and Optimization
MetadataShow full item record
Centre d'économie de la Sorbonne [CES]
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Abstract (EN)In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff.
Subjects / Keywordsdynamic programming; Markov decision processes; partial observation; uniform value; long-run average payoff
Showing items related by title and author.
Zanuttini, Bruno; Lang, Jérôme; Saffidine, Abdallah; Schwarzentruber, François (2019) Article accepté pour publication ou publié