Abstract: We study agents acting in an unknown environment where the agent’s goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all ...