From concentrability coefficients to maximum state entropy exploration

From concentrability coefficients to maximum state entropy exploration

We provide an explanation as to why maximum entropy data distributions are minimax optimal for approximate value iteration algorithms in the face of uncertainty regarding the underlying Markov decision process (MDP). We also investigate connections between such minimax optimal solutions and maximum state entropy exploration methods.

June 2024 · Pedro P. Santos (based in joint work with Diogo S. Carvalho, Alberto Sardinha, and Francisco S. Melo)