Statistics Department Seminar Series: Will Wei Sun, Associate Professor, Department of Quantitative Methods, Department of Statistics (by courtesy), Purdue University
"Aligning Large Language Models with Heterogeneous Human Feedback: When Statistics Meets LLMs"
Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the leading approach to aligning large language models (LLMs) with human preferences. Despite its success, two challenges remain fundamental: feedback is costly and heterogeneous across annotators, and the resulting reward models often lack principled measures of uncertainty. This talk presents recent advances that address these challenges by integrating tools from optimal design and statistical inference into the RLHF framework. First, I introduce a dual active learning approach, inspired by optimal design, that adaptively selects both conversations and annotators to maximize information gain, improving the efficiency of limited feedback budgets. Second, I present a framework for uncertainty quantification in reward learning, enabling valid statistical comparisons across LLM models and more reliable best-of-n alignment policies. Together, these results illustrate how statistics can help trustworthy and data-efficient LLM alignment.
| Building: | West Hall |
|---|---|
| Website: | |
| Event Type: | Workshop / Seminar |
| Tags: | seminar |
| Source: | Happening @ Michigan from Department of Statistics |
