I added player 1’s error and player 2’s error in each matchup to my total error.

So the real Brier score total error for Elo is around 0.

12.

Much better!What I’ve learned:Brier scores are great at identifying bad predictions.

However, when actual game outcomes are near 50/50, they don’t offer much feedback.

When game outcomes have less variation, i.

e.

a team can win 95% of the time, Win-Loss records perform much better.

In fact, I’m not sure Elo can beat them over long seasons unless there is skill change over time.

When there is more true skill mobility (the rtg_nudge setting), Elo outperforms Win-Loss.

In the real world, where teams gain and lose relative skill, this is essential.

Elo autocorrelates.

Over super long seasons, it gets out of control and really good teams approach unreasonable win probabilities near 100%.

Likewise, bad teams are given a 0% chance to win at the end of the season, despite having a reasonable chance to do so.

[Edit] Me again, from the future.

This is also because in 1,000 game seasons, the true rating changes to a point where players actually are winning 100% of their games and others are losing 100%.

Wins/Losses are obviously not unstable.

They will always offer reasonable (possibly outdated) predictions.

The limit to how good they can be depends on how much randomness is involved and how much skill change there is over time.

In order to quickly get accuracy at the extremes (I.

e, when the best team plays the worst team), you have to raise your K-value.

Raising the K-value hurts the accuracy of the mid-tier teams.

So there is a trade-off — either be accurate on the uneven matchups or be accurate on the toss-up games.

All in all, I feel like I’ve learned a lot.

Coming in, I thought Elo was more stable, and I though there would be a limit to how high the ratings for a team could get.

I also didn’t see the tradeoff of lopsided matchups vs even matchups.

Future posts and goals:Improve Elo through various means.

Margin of victory, keeping it stable, etc.

Test regressions that use both Win-Loss and EloPerhaps use Win-Loss differential as another baseline toolOperate on real dataCan’t wait!.