History-dependent evaluations in POMDPs
Venel, Xavier; Ziliotto, Bruno (2021), History-dependent evaluations in POMDPs, SIAM Journal on Control and Optimization, 59, 2, p. 1730–1755. 10.1137/20M1332876
TypeArticle accepté pour publication ou publié
Journal nameSIAM Journal on Control and Optimization
SIAM - Society for Industrial and Applied Mathematics
MetadataShow full item record
Centre d'économie de la Sorbonne [CES]
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Abstract (EN)We consider POMDPs in which the weight of the stage payoff depends on the past sequence of signals and actions occurring in the infinitely repeated problem. We prove that for all epsilon>0, there exists a strategy that is epsilon-optimal for any sequence of weights satisfying a property that interprets as "the decision-maker is patient enough". This unifies and generalizes several results of the literature, and applies notably to POMDPs with limsup payoffs.
Subjects / KeywordsMarkov decision process; partial observation; long-run average payoff
Showing items related by title and author.