RPBench Leaderboard

Blog / GitHub
RPBench-Auto is an automated pipeline for evaluating large language models for role-playing. It has 80 personae for character-based role-playing and 80 scenes for scene-based role-playing.

Baseline: GPT-4o (2024-05-13) / Judge model: GPT-4-Turbo (2024-04-09)