Events
Towards Effective RL Training for LLMs
Speaker: Yi Wu
Location: 60 Fifth Avenue, Room Room 206
Date: Monday, December 2, 2024
Reinforcement Learning (RL) has been a widely adopted LLM post-training approach for enhanced alignment and reasoning capabilities. This talk will present our recent progress in designing effective RL algorithms and systems for training LLMs. On the algorithm side, we will first discuss the pros and cons between popular RLHF methods, i.e., DPO and PPO, and show that properly configured PPO training can substantially improve LLM performances on challenging competitive coding benchmarks. Then we will discuss some common pitfalls for LLM reward design, which can easily lead to undesired failures in LLM RL training. We suggest simple tricks that can stabilize RL training and improve LLM math reasoning capabilities. On the system side, we will present our distributed RLHF training system, ReaLHF, which serves as a system foundation for all of our algorithmic works. ReaLHF specializes in LLM RL training and can achieve over 10x speedup compared with other open-sourced RLHF systems.