Investigating Offline Reinforcement Learning

Two friends and I trained an offline RL model using D4RL and the Stationary Distribution Correction algorithm from OptiDICE.

We tested and experimented the stability and accuracy of the model in MuJoCo environment with agent cheetah and maze-2d.