Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR

研究方向
出版物
In Proc. of ACL 2026 findings