Our new work on multi-turn behavior elicitation with RL is now on [arXiv].