• xmlui.mirage2.page-structure.header.title
    • français
    • English
  • Help
  • Login
  • Language 
    • Français
    • English
View Item 
  •   BIRD Home
  • CEREMADE (UMR CNRS 7534)
  • CEREMADE : Publications
  • View Item
  •   BIRD Home
  • CEREMADE (UMR CNRS 7534)
  • CEREMADE : Publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Browse

BIRDResearch centres & CollectionsBy Issue DateAuthorsTitlesTypeThis CollectionBy Issue DateAuthorsTitlesType

My Account

LoginRegister

Statistics

Most Popular ItemsStatistics by CountryMost Popular Authors
Thumbnail

Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes

Venel, Xavier; Ziliotto, Bruno (2016), Strong Uniform Value in Gambling Houses and Partially Observable Markov Decision Processes, SIAM Journal on Control and Optimization, 54, 4, p. 1983-2008. 10.1137/15M1043340

View/Open
Revision_venel_ziliotto5.pdf (423.0Kb)
Type
Article accepté pour publication ou publié
Date
2016
Journal name
SIAM Journal on Control and Optimization
Volume
54
Number
4
Publisher
SIAM - Society for Industrial and Applied Mathematics
Pages
1983-2008
Publication identifier
10.1137/15M1043340
Metadata
Show full item record
Author(s)
Venel, Xavier cc
Centre d'économie de la Sorbonne [CES]
Ziliotto, Bruno
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Abstract (EN)
In several standard models of dynamic programming (gambling houses, MDPs, POMDPs), we prove the existence of a robust notion of value for the infinitely repeated problem, namely the strong uniform value. This solves two open problems. First, this shows that for any > 0, the decision-maker has a pure strategy σ which is-optimal in any n-stage problem, provided that n is big enough (this result was only known for behavior strategies, that is, strategies which use randomization). Second, for any > 0, the decision-maker can guarantee the limit of the n-stage value minus in the infinite problem where the payoff is the expectation of the inferior limit of the time average payoff.
Subjects / Keywords
dynamic programming; Markov decision processes; partial observation; uniform value; long-run average payoff

Related items

Showing items related by title and author.

  • Thumbnail
    Knowledge-Based Programs as Succinct Policies for Partially Observable Domains 
    Zanuttini, Bruno; Lang, Jérôme; Saffidine, Abdallah; Schwarzentruber, François (2019) Article accepté pour publication ou publié
  • Thumbnail
    Generalisation of alpha-beta search for AND-OR graphs with partially ordered values 
    Li, Junkang; Cazenave, Tristan; Zanuttini, Bruno; Ventos, Veronique (2022) Communication / Conférence
  • Thumbnail
    Learning opening books in partially observable games: using random seeds in Phantom Go 
    Cazenave, Tristan; Liu, Jialin; Teytaud, Fabien; Teytaud, Olivier (2016) Communication / Conférence
  • Thumbnail
    General limit value in zero-sum stochastic games 
    Ziliotto, Bruno (2016) Article accepté pour publication ou publié
  • Thumbnail
    Existence of the uniform value in zero-sum repeated games with a more informed controller 
    Gensbittel, Fabien; Oliu-Barton, Miquel; Venel, Xavier (2014) Article accepté pour publication ou publié
Dauphine PSL Bibliothèque logo
Place du Maréchal de Lattre de Tassigny 75775 Paris Cedex 16
Phone: 01 44 05 40 94
Contact
Dauphine PSL logoEQUIS logoCreative Commons logo