About
My research centers on reinforcement learning for language models, and test-time compute (Best-of-N / Minimum Bayes Risk decoding). I aim to bridge theoretical guarantees with practical pipelines for real-world text generation tasks.
Interests
- Multi-objective RLHF / RLVR, reward hacking mitigation
- Test-time compute (Best-of-N / Minimum Bayes Risk decoding)