My ELO rating system explained

I’ve been wanting to try my hand at building a rating system to predict AFL results for a while. I’ve decided to begin with a relatively simple ELO rating system. The ELO rating system was originally developed to rank chess players, but more recently has been used for a lot of sports, including AFL, to assess the relative strengths of teams within a competition.

For a super good explainer on how to build an ELO rating system, I highly recommend the following readings

Also anything by FiveThirtyEight where they discuss their ELO ratings for NFL, Baseball and Basketball is useful reading. I’ll attempt to explain my system below but cannot recommend those readings enough!

Typical ELO

The nice thing about ELO rating systems is their simplicity. Essentially, our only inputs are the rating of the home team, the rating of the away team and the result. From here, we can assess how the result panned out relative to what we would expect given the relative strengths of each side, and then adjust our rating for each team accordingly. If a team wins by more than expected, we give them a bump in ratings. The system is very objective in this sense and doesn’t take into account things like the players within a team, weather, emotional responses to recent events and the myriad of subjective factors that a typical punter might inherently use.

The typical ELO rating system uses the following formula 1

ELO_{new} = ELO_{old} + k(Result_{actual} - Result_{predicted})

The mechanics of this equation are basically, take the teams ELO rating before the match, assess if the result was better or worse than predicted, and then add the scaled difference in the predicted and actual result. My main design decisions now are a) how to define the predicted result, b) how to define the actual result and c) what to set scaling parameter k to.

Actual Result

In chess, the actual result is typically a 0 for a loss, 0.5 for a draw and 1 for a win, while the predicted result is the probability of winning, based on the difference in ELO ratings between two opponents. This means that if someone wins in an upset (i.e. against an opponent with a higher rating than them), they get a bigger boost in ratings than someone who wins when they are expected to win. Ratings are also zero sum, so the change in ratings for the winner is the same as the negative change in ratings for the loser.

For an AFL system, we could use a similar simple system like the Chess example above, where we assign a 1 for a win and a 0 for a loss, but I’ve decided to use the Margin of the match as the indicator of performance. To do this, we need to convert the Margin to a comparable scale to our predicted result (i.e. between 0 and 1). For a good example of how this can be done, I refer you once again to MAFL.

In order to get our Actual Result on a scale of 0 to 1, I need to scale it in some way I could use a simple rescaling along the range of all results in the AFL, however I’ve decided to get a little more complex by using the following equation. It is essentially the Generalised Logistic function, with most parameters set to 0 or 1 2

Result_{actual} = \frac{1}{1 + e(-0.05 * Margin)}

The mapping of those results is below. Essentially, my model rewards winning by bigger margins, but with diminishing returns. The difference between a 70 and 80 point win is less than the difference between a 5 and a 15 point win. Again, this isn’t based on anything other than some creative license on my behalf based on the shape of the curve and by some sage advice in the previously mentioned Matter of Stats post.

eloAct

 

Predicted Result

My predicted rating system is below. This is taken from FiveThirtyEight’s NFL model, which is similar to traditional ELO models used in chess. It essentially takes the ELO difference, calculated as the Home team ELO, plus the Home Ground Advantage, minus the Away team ELO, and converts it to a value between 0 and 1. Positive differences in ELO ratings (i.e. for the higher rated team) give us a Expected result of greater than 0.5.

Result_{predicted} = \frac{1}{1 + 10^(\frac{-eloDiff}{M})}

eloDiff = ELO_{home} - ELO_{away} + HGA

The two main parameters we can set are M and HGA. M is a constant that scales the eloDiff and its influence on expected outcomes. I’ve used a constant of 400, again borrowing from FivethirtyEight. HGA (Home ground advantage) gives the home team’s chances a bump. For my initial ELO rating system, I’m setting this to a constant of of 35, which equates to about 8 points in terms of Margin, which is the long term average for home team outcomes. I hope to eventually update this to be parameterised by updating based on recent experience at a ground or perhaps travel distance, but thats it for now.

You can see, for a range of rating differences (including HGA), the expected outcome. This expected outcome is also what I currently use as my probability 3, so an expected outcome of 0.75 equates to a 75% chance of winning.

expOutcome

Special K

Now that I have mapped Expected and Actual results to values between 0 and 1, I just have to decide on k. The k value allows me to scale how much my ratings are effected by new results. Large values of k are really impressed by new information, while low values of k tend to require more information to move the ratings a lot. It is hard to know what to pick for k without performing some kind of optimisation of this parameter 4. For now, I tested a few different values and found that 20 gave me the best results – around the same as FiveThityEight’s NFL and NBA models.

Iterating the ratings

The only other issue to solve is what to do a) when a new team is introduced and b) at the start of a new season. For the first issue, I simply start any new team at 1500 points in their first season. Because ELO ratings are zero-sum after each match, this means that my league average is always 1500. For continuity, I treat Sydney as the same team as South Melbourne and the Brisbane Lions as the same team as the Brisbane Bears.

After each season, I regress each teams ELO rating towards the league average of 1500. This helps to account for things that are inherently built into the AFL such as equalisation, whereby good teams aren’t meant to stay good for too long and bad teams aren’t meant to stay bad for too long 5, as well as the myriad of other things that happen during an AFL offseason. I do this at a rate of 60% in that a team will move 60% of the distance between its current rating and 1500. If a team was rated as 1600, they would regress to 1540, while a team on 1400 would regress to 1460. This seems high but I tried a few different values and this seemed to work the best. I plan to optimise this in the next implementation.

ELO_{New Seas} = (0.6\times ELO_{Old Seas}) + ((1-0.6)\times 1500)

Results!

Now that I’ve got that out of the way, I can go through each match in the period of the VFL/AFL and get ELO ratings for each team! Below I’ve plotted each teams ratings over their history. There will be a few posts later on (and, one day, an interactive page) to explore these ratings, but this gives a bit of an idea. I’ve added a line at 1500 to show where a team is rated compared to the league average.

teams

I also can use the ELO difference before each game to predict the binary outcome (win/loss), the margin and also a probability. I plan to write another piece on that, but for now, I can report that across the entire history of VFL/AFL, the model has tipped 68.6% of games correctly, with a Mean Absolute Error in the Margin of 26.8 points. We can see that the performance is in general getting worse over time – possibly due to expansion of the league (i.e. more games to get wrong).

mapeandtips

 

Notes:

  1. One aspect of my rating system that does slightly differ from FiveThirtyEight is that teams don’t always gain points for a win. Their model uses the simple 1, 0.5 and 0 point system for actual results and so a team that wins can never lose points. In my system, a team needs to win by a greater margin than we expect to gain points. An issue with FiveThirtyEight’s system that they discuss is autocorrelation, which means that x. They account for this using a Margin of Victory (MOV) multiplier, which essentially scales the k value based upon the margin of victory, so that it is reduced for blowout wins/losses.

    In my haste to develop my system, I thought this would be cool to have in my model and so implemented it. While writing this up, I’ve realised that it probably isn’t needed since my calculation for mapping Margin onto a scale of 0 to 1 already “squashes” bigger results. Interestingly however, including it in there does slightly improve my historical performance of predicting results (68.6% correct tips, MAE of 26.8) versus taking it out (67.0% correct tips, MAE of 27.6) over the entire history of the AFL. For continuity, given I’ve included it in my predictions so far this season, I’ll leave it in but I’ll likely revisit it after the season. The equation for including the MOV is below.

    MOV = \log (abs(Margin) +1) \times \frac{2.2}{0.001\times eloDiff + 2.2}

  2. For clarity, A = Lower asymptote = 0, K = Upper asmpytote = 1, B = Growth Rate = 0.05, v = 1, Q = 1, C = 1
  3. I’m looking to change this for next year, likely to actually model historical ELO rating differences on probability of winning
  4. which I plan to do for my next implementation
  5. as far as I know, not empirically tested?

2 thoughts on “My ELO rating system explained”

  1. Hi very interested in this, and adapting it for Basketball – and multiple teams across multiple grades, HOWEVER, if I use the formula :
    Result = 1/(1+e(-0.05*Margin)
    – (I assume e=Euler’s Constant AND the Margin is NEGATIVE for a losing Team ?)

    for 2 Teams whose Rating is identical (Start of the Season) and Team 1 loses by 5 goals, they have a NEW Rating HIGHER than Team 2 who WON by 5 goals ?

    eg Team 1 Rating = 1500 – Loses by 30 points
    Team 2 Rating = 1500 – Wins by 30 points

    T1 Result = =1/(1+(e*(-0.05*-30))) = 0.19695
    T2 Result = =1/(1+(e*(-0.05*30))) = -0.32494

    Based on your Predicted Result Formula – HOWEVER, do not use HGA as no court advantages in Basketball when every team plays on every court as often.
    T1 Expected Result = 0.5
    T2 Expected Result = 0.5

    Using the ELO Formula to Calculate NEW Ratings (k=20)
    T1 NEW Rating = T1PrevRtg + k*(T1Result – T2Result)
    T2 NEW Rating = T2PrevRtg + k*(T2Result – T1Result)

    T1 NEW Rating = 1500 + 20*(0.19695 – (-0.32495)) = 1506.061
    T2 NEW Rating = 1500 + 20*(-0.32495 – 0.19695) = 1483.501

    HOW can Team 1 get a Higher Rating when it lost by 33 points ?

    Also, If I use the same formulas as above, I would also assume that IF Team 1 had a Higher Rating than Team 2, and they Drew, then Team 1’s rating should go down slightly, and Team 2’s should go up ? This also does not happen – Both Teams actually go up ?

    Very interested in thoughts and possible solutions.

    1. Hi Paul,

      Good to hear from you and thanks for the interest in the site!

      I believe you are on the right track, you just have the maths a little wrong.

      In your step where you get the T1 Result – the formula calls for finding the exponential of -0.05 * Result. This using the natural exponential function . It is basically Eulers constant to the power of whatever is in the brackets. I believe you are multiplying instead of using the natural exponential function?

      If you are using Excel, you can use the function EXP and put whatever is in the brackets (i.e. -0.05 * T1 Result) into the formula. Similarly, R has a built in exp function. Otherwise, you are just using roughly 2.71828182845904^(-0.05 * T1 Result).

      With this new formula you won’t get any negative numbers!

      So,
      T1 Result = 1/(1 + (e^(-0.05*-30) = 0.1824255
      T2 Result = 1/(1 + (e^(-0.05*30) = 0.8175745

      Your Expected Results are find providing you go in with the same rating and no HGA. That obviously changes after the first round since most teams will no longer have the same rating. In this case, you’ll need to use the Predicted Results Formula.

      In the final equation, we actually want to put the difference between the “Actual Result” and the “Expected Result”. In your example,
      T1 New Rating = 1500 + 20*(0.1824255 – 0.5) = 1493.649
      T2 New Rating = 1500 + 20*(0.8175745 – 0.5) = 1506.351

      The formula is designed so that winning teams can actually lose ratings if they don’t win by as much as expected. In your example however, since we ‘expect’ a draw because their ratings are the same, then the winning team in this case will always gain ratings points. Similarly, a draw in this specific case will lead to no changes in the ratings (however ratings will change if we didn’t expect a draw!).

      Hope that makes sense. Feel free to email me at jamesthomasday [ at ] gmail [dot] com if you’d like more info!

Leave a Reply