How Sensitive Is PPO to Reward Shaping?
Lock the Promise: Discover why your PPO agent’s impressive performance might be a fragile illusion of reward design. We show how small, seemingly innocuous changes to the reward function can dramatically alter both the learning curve and the final policy’s quality. Tight Premise: Using a standard MuJoCo benchmark, we test PPO’s sensitivity by implementing three […]