
History-dependent evaluations in POMDPs
Venel, Xavier; Ziliotto, Bruno (2021), History-dependent evaluations in POMDPs, SIAM Journal on Control and Optimization, 59, 2, p. 1730–1755. 10.1137/20M1332876
View/ Open
Type
Article accepté pour publication ou publiéDate
2021Journal name
SIAM Journal on Control and OptimizationVolume
59Number
2Publisher
SIAM - Society for Industrial and Applied Mathematics
Pages
1730–1755
Publication identifier
Metadata
Show full item recordAuthor(s)
Venel, Xavier
Centre d'économie de la Sorbonne [CES]
Ziliotto, Bruno
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Abstract (EN)
We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.Subjects / Keywords
Markov decision process; partial observation; long-run average payoffRelated items
Showing items related by title and author.
-
Chatterjee, Krishnendu; Saona, Raimundo; Ziliotto, Bruno (2022) Article accepté pour publication ou publié
-
Chatterjee, Krishnendu; Saona, Raimundo; Ziliotto, Bruno (2022) Article accepté pour publication ou publié
-
Venel, Xavier; Ziliotto, Bruno (2016) Article accepté pour publication ou publié
-
Kwon, Joon; Ziliotto, Bruno (2023) Document de travail / Working paper
-
Renault, Jérôme; Ziliotto, Bruno (2020) Article accepté pour publication ou publié