"RL with KL penalties is better viewed as Bayesian inference."

Tomasz Korbak, Ethan Perez, Christopher L. Buckley (2022)

Details and statistics

DOI: 10.18653/V1/2022.FINDINGS-EMNLP.77

access: open

type: Conference or Workshop Paper

metadata version: 2023-08-10