Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Paper
• 2404.12318 • Published
• 15
We propose to perform reward optimization using a RM trained for a different language. Assuming model generation quality transfers cross-lingually