Level design as an MDP
Researchers and designers are framing level design as a Markov Decision Process: rich observation/action spaces where designers iteratively edit levels with video demos as reward signals — useful if you’re building procedural editors or AI‑assisted tools. Horror‑leaning pitches (e.g., 'El diseñador de niveles') are already applying the model to reality‑warping layouts. (x.com) (x.com)
Colan F. Biemer and Seth Cooper published "Level Assembly as a Markov Decision Process" on arXiv on April 27, 2023 and report using adaptive dynamic programming (ADP) to solve an MDP formulation of level generation across two case studies, with ADP outperforming two baselines. (arxiv.org) The PCGRL framework—authored by Ahmed Khalifa, Philip Bontrager, Sam Earle and Julian Togelius and presented at AIIDE 2020—explicitly casts procedural level generation as an RL problem and evaluates three observation/action representations called Narrow, Turtle and Wide. (cdn.aaai.org) PCGRL also shipped an OpenAI Gym interface and reference implementations that train level-generating agents with PPO2, and multiple GitHub forks and packages (gym-pcgrl) reproduce those experiments for Zelda, Sokoban and binary grid environments. (github.com) Reward-from-video research now supplies practical pipelines for using designer demos as dense reward signals: REDS (REward learning from Demonstration with Segmentations) treats segmented, action-free videos as ground-truth rewards and was accepted in OpenReview with a decision posted January 21, 2025. (openreview.net) Complementary work such as Diffusion Reward uses conditional video diffusion models to learn reward functions from expert gameplay videos for complex visual RL tasks, demonstrating a route from raw video to trainable reward models. (link.springer.com) LLM-driven and hybrid approaches have emerged to automate reward specification for PCG: papers titled ChatPCG (June 2024) and PCGRLLM (February 2025) propose generating reward code or prompts with language models to speed reward-design for procedural content systems. (arxiv.org) Putting the pieces together, recent literature and tool releases show an actionable stack—MDP/ADP formulations for assembling level progression (Biemer & Cooper 2023), RL generators and Gym toolkits from PCGRL (2020), video-to-reward methods like REDS (2025) and diffusion-based reward learners—many with public code or proceedings papers for replication. (arxiv.org)