Finite-Memory Strategies in POMDPs with Long-Run Average Objectives
Chatterjee, Krishnendu; Saona, Raimundo; Ziliotto, Bruno (2022), Finite-Memory Strategies in POMDPs with Long-Run Average Objectives, Mathematics of Operations Research, 47, 1, p. 100-119. 10.1287/moor.2020.1116
TypeArticle accepté pour publication ou publié
Journal nameMathematics of Operations Research
MetadataShow full item record
Institute of Science and Technology [Klosterneuburg, Austria] [IST Austria]
CEntre de REcherches en MAthématiques de la DEcision [CEREMADE]
Abstract (EN)We study the problem of approximation of optimal values in partially-observable Markov decision processes (POMDPs) with long-run average objectives. POMDPs are a standard model for dynamic systems with probabilistic and nondeterministic behavior in uncertain environments. In long-run average objectives rewards are associated with every transition of the POMDP and the payoff is the long-run average of the rewards along the executions of the POMDP. We establish strategy complexity and computational complexity results. Our main result shows that finite-memory strategies suffice for approximation of optimal values, and the related decision problem is recursively enumerable complete.
Subjects / Keywordsfinite state; Markov; dynamic programming; computational complexity; analysis of algorithms
Showing items related by title and author.