Reinforcement learning (RL) has become one of the most central problems in machine learning, showcasing remarkable success in recommendation systems, robotics and super-human level game plays. Yet, existing literature predominantly focuses on (almost) fully observable environments, overlooking the complexities of real-world scenarios where crucial information remains hidden.
In this talk, we consider reinforcement learning in partially observable systems through the proposed framework of the Latent Markov Decision Process (LMDP). In LMDPs, an MDP is randomly drawn from a set of possible MDPs at the beginning of the interaction, but the context -- the latent factors identifying the chosen MDP -- is not revealed to the agent. This opacity poses new challenges for decision-making, particularly in scenarios like recommendation systems without sensitive user data, or medical treatments for undiagnosed illnesses. Despite the significant relevance of LMDPs to real-world problems, existing theories rely on restrictive separation assumptions -- an unrealistic constraint in practical applications. We present a series of new results addressing this gap: from leveraging higher-order information to develop sample-efficient RL algorithms, to establishing lower bounds and improved results under more realistic assumptions within Latent MDPs.