"COPR: Continual Human Preference Learning via Optimal Policy Regularization."

Details and statistics

DOI: 10.48550/ARXIV.2402.14228

access: open

type: Withdrawn Item

metadata version: 2025-01-16