What happens when you make LLMs play 2048? We tested various models to see who would achieve the highest scores.
Each model plays multiple games, and we track their average score, maximum tile achieved, and win rate.
Model | Score | Max Tile | Win Rate |
---|---|---|---|
GPT 4 | 56.80 | 11.2 | 0.00% |
GPT-3.5-Turbo | 55.20 | 10.4 | 0.00% |