[go: up one dir, main page]

Create BGG ranking from "rating matches" using Elo scores

From ChatGPT:

An Elo-style adjustment for ranking board games based on user ratings would involve adapting the Elo rating system—originally designed for competitive matchmaking (e.g., chess)—to pairwise game comparisons while handling missing data in a more principled way than the Schulze method.


1. How Elo Works in Head-to-Head Comparisons

Elo assigns each game a rating (e.g., starting at 1500), then updates it based on the outcome of matchups:

  • If Game A beats Game B, A’s rating increases, and B’s rating decreases.
  • The amount of change depends on how expected the result was:
    • If A was already highly rated, it gains little from beating a low-rated B.
    • If B upsets a much higher-rated A, it gains a lot of points.

The Elo update formula for a game (X) (against (Y)) is:

[ R_X' = R_X + K \cdot (S - E) ]

Where:

  • ( R_X ) = current rating of game (X)

  • ( K ) = adjustment factor (higher values make ratings fluctuate more)

  • ( S ) = actual outcome (1 if (X) wins, 0 if (X) loses, 0.5 for a tie)

  • ( E ) = expected probability of (X) winning against (Y), given by:

    [ E = \frac{1}{1 + 10^{(R_Y - R_X)/400}} ]


2. Applying Elo to Board Game Ratings

In a board game ranking context, you can define "matchups" between games based on user preferences:

  • If a user rates Game A higher than Game B, count it as A "winning" against B.
  • If a user rates them equally, count it as a "draw."
  • Missing comparisons (unplayed games) are simply ignored instead of assuming a loss.

Each game’s rating would be updated iteratively across many users, leading to a stable ranking where games settle into their appropriate positions based on collective preferences.


3. Advantages Over Schulze

Handles missing data better

  • The Elo system only considers explicit preferences and ignores unranked games instead of treating them as losses.
  • This avoids bias toward games with more ratings.

Adaptive to new data

  • As new ratings come in, the system updates dynamically, making it more robust to shifts in popularity.

Allows uncertainty modeling

  • Newer or lesser-known games could start with a wider range of possible ratings to reflect uncertainty (e.g., Glicko system).

4. Potential Issues & Solutions

⚠️ Initial Seeding Matters

  • All games start at the same rating unless seeded with prior information (e.g., BGG's Bayesian average).
  • A hybrid approach could use Elo + a prior based on existing ratings.

⚠️ Cold Start for New Games

  • Until a game has been rated against others, it remains unrated.
  • Possible fix: Use collaborative filtering to estimate early rankings.

⚠️ Sensitivity to K-Factor Choice

  • If K is too high, ratings fluctuate too much. If too low, they take too long to adjust.
  • A dynamic K (higher for new games, lower for established ones) could help.

5. Implementation Ideas

Would you like a Python implementation? You could:

  • Parse BGG ratings into pairwise comparisons per user.
  • Run an iterative Elo update algorithm.
  • Tune parameters (K-factor, initial rating) to match empirical ranking trends.

This approach would likely produce a more user-sensitive ranking than BGG’s Bayesian method while avoiding Schulze’s bias against unranked games. 🚀