Yuki Ichihara

Starting in Aug 2026, I will join MBZUAI as a Ph.D. student, supervised by Junpei Komiyama.

My recent interests include Best-of-N and Minimum Bayes Risk decoding, alignment under multiple objectives, and theoretical questions that help explain practical generation behavior.

Research Interests Reinforcement learning for language models, alignment, decoding, reward design, evaluation, and multi-objective generation.

News

I will join MBZUAI as a Ph.D. student, supervised by Junpei Komiyama.
I joined MBZUAI as a Visiting Student.
Consensus Group Relative Policy Optimization for Text Generation was posted on arXiv.

Selected Publications

  1. Consensus Group Relative Policy Optimization for Text Generation
    arXiv preprint, 2026
  2. MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems
    arXiv preprint, 2025
  3. Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks
    EMNLP Industry Track, 2025
  4. Theoretical Guarantees for Minimum Bayes Risk Decoding
    ACL, 2025
  5. Evaluation of Best-of-N Sampling Strategies for Language Model Alignment
    Transactions on Machine Learning Research, 2025

Background

Education

MBZUAI Ph.D. (incoming), supervised by Junpei Komiyama
Nara Institute of Science and Technology Ph.D. in Engineering
Nara Institute of Science and Technology M.S. in Engineering
Tokyo University of Science B.S. in Electrical Engineering

Research Experience

MBZUAI Visiting Student
CyberAgent AI Lab Research Intern
ATR Research Assistant
JST SPRING Fellowship Doctoral fellowship