Blog / GitHub
RPBench-Auto is an automated pipeline for evaluating large language
models for role-playing. It has 80 personae for character-based
role-playing and 80 scenes for scene-based role-playing.
Baseline: GPT-4o (2024-05-13) / Judge model: GPT-4-Turbo (2024-04-09)