Publications

Last updated:

About

My research centers on reinforcement learning for language models, and test-time compute (Best-of-N / Minimum Bayes Risk decoding). I aim to bridge theoretical guarantees with practical pipelines for real-world text generation tasks.

Interests

  • Multi-objective RLHF / RLVR, reward hacking mitigation
  • Test-time compute (Best-of-N / Minimum Bayes Risk decoding)

Education

Scholarships

Internships

Contact

y.ichihara3406@gmail.com

github.com/yu-ki3406