## My ELO rating system explained

I’ve been wanting to try my hand at building a rating system to predict AFL results for a while. I’ve decided to begin with a relatively simple ELO rating system. The ELO rating system was originally developed to rank chess players, but more recently has been used for a lot of sports, including AFL, to assess the relative strengths of teams within a competition.

For a super good explainer on how to build an ELO rating system, I highly recommend the following readings

Also anything by FiveThirtyEight where they discuss their ELO ratings for NFL, Baseball and Basketball is useful reading. I’ll attempt to explain my system below but cannot recommend those readings enough!

###### Typical ELO

The nice thing about ELO rating systems is their simplicity. Essentially, our only inputs are the rating of the home team, the rating of the away team and the result. From here, we can assess how the result panned out relative to what we would expect given the relative strengths of each side, and then adjust our rating for each team accordingly. If a team wins by more than expected, we give them a bump in ratings. The system is very objective in this sense and doesn’t take into account things like the players within a team, weather, emotional responses to recent events and the myriad of subjective factors that a typical punter might inherently use.

The typical ELO rating system uses the following formula 1

$ELO_{new} = ELO_{old} + k(Result_{actual} - Result_{predicted})$

The mechanics of this equation are basically, take the teams ELO rating before the match, assess if the result was better or worse than predicted, and then add the scaled difference in the predicted and actual result. My main design decisions now are a) how to define the predicted result, b) how to define the actual result and c) what to set scaling parameter k to.

###### Actual Result

In chess, the actual result is typically a 0 for a loss, 0.5 for a draw and 1 for a win, while the predicted result is the probability of winning, based on the difference in ELO ratings between two opponents. This means that if someone wins in an upset (i.e. against an opponent with a higher rating than them), they get a bigger boost in ratings than someone who wins when they are expected to win. Ratings are also zero sum, so the change in ratings for the winner is the same as the negative change in ratings for the loser.

For an AFL system, we could use a similar simple system like the Chess example above, where we assign a 1 for a win and a 0 for a loss, but I’ve decided to use the Margin of the match as the indicator of performance. To do this, we need to convert the Margin to a comparable scale to our predicted result (i.e. between 0 and 1). For a good example of how this can be done, I refer you once again to MAFL.

In order to get our Actual Result on a scale of 0 to 1, I need to scale it in some way I could use a simple rescaling along the range of all results in the AFL, however I’ve decided to get a little more complex by using the following equation. It is essentially the Generalised Logistic function, with most parameters set to 0 or 1 2

$Result_{actual} = \frac{1}{1 + e(-0.05 * Margin)}$

The mapping of those results is below. Essentially, my model rewards winning by bigger margins, but with diminishing returns. The difference between a 70 and 80 point win is less than the difference between a 5 and a 15 point win. Again, this isn’t based on anything other than some creative license on my behalf based on the shape of the curve and by some sage advice in the previously mentioned Matter of Stats post.

###### Predicted Result

My predicted rating system is below. This is taken from FiveThirtyEight’s NFL model, which is similar to traditional ELO models used in chess. It essentially takes the ELO difference, calculated as the Home team ELO, plus the Home Ground Advantage, minus the Away team ELO, and converts it to a value between 0 and 1. Positive differences in ELO ratings (i.e. for the higher rated team) give us a Expected result of greater than 0.5.

$Result_{predicted} = \frac{1}{1 + 10^(\frac{-eloDiff}{M})}$

$eloDiff = ELO_{home} - ELO_{away} + HGA$

The two main parameters we can set are M and HGA. M is a constant that scales the eloDiff and its influence on expected outcomes. I’ve used a constant of 400, again borrowing from FivethirtyEight. HGA (Home ground advantage) gives the home team’s chances a bump. For my initial ELO rating system, I’m setting this to a constant of of 35, which equates to about 8 points in terms of Margin, which is the long term average for home team outcomes. I hope to eventually update this to be parameterised by updating based on recent experience at a ground or perhaps travel distance, but thats it for now.

You can see, for a range of rating differences (including HGA), the expected outcome. This expected outcome is also what I currently use as my probability 3, so an expected outcome of 0.75 equates to a 75% chance of winning.

###### Special K

Now that I have mapped Expected and Actual results to values between 0 and 1, I just have to decide on k. The k value allows me to scale how much my ratings are effected by new results. Large values of k are really impressed by new information, while low values of k tend to require more information to move the ratings a lot. It is hard to know what to pick for k without performing some kind of optimisation of this parameter 4. For now, I tested a few different values and found that 20 gave me the best results – around the same as FiveThityEight’s NFL and NBA models.

###### Iterating the ratings

The only other issue to solve is what to do a) when a new team is introduced and b) at the start of a new season. For the first issue, I simply start any new team at 1500 points in their first season. Because ELO ratings are zero-sum after each match, this means that my league average is always 1500. For continuity, I treat Sydney as the same team as South Melbourne and the Brisbane Lions as the same team as the Brisbane Bears.

After each season, I regress each teams ELO rating towards the league average of 1500. This helps to account for things that are inherently built into the AFL such as equalisation, whereby good teams aren’t meant to stay good for too long and bad teams aren’t meant to stay bad for too long 5, as well as the myriad of other things that happen during an AFL offseason. I do this at a rate of 60% in that a team will move 60% of the distance between its current rating and 1500. If a team was rated as 1600, they would regress to 1540, while a team on 1400 would regress to 1460. This seems high but I tried a few different values and this seemed to work the best. I plan to optimise this in the next implementation.

$ELO_{New Seas} = (0.6\times ELO_{Old Seas}) + ((1-0.6)\times 1500)$

###### Results!

Now that I’ve got that out of the way, I can go through each match in the period of the VFL/AFL and get ELO ratings for each team! Below I’ve plotted each teams ratings over their history. There will be a few posts later on (and, one day, an interactive page) to explore these ratings, but this gives a bit of an idea. I’ve added a line at 1500 to show where a team is rated compared to the league average.

I also can use the ELO difference before each game to predict the binary outcome (win/loss), the margin and also a probability. I plan to write another piece on that, but for now, I can report that across the entire history of VFL/AFL, the model has tipped 68.6% of games correctly, with a Mean Absolute Error in the Margin of 26.8 points. We can see that the performance is in general getting worse over time – possibly due to expansion of the league (i.e. more games to get wrong).

Notes:

1. One aspect of my rating system that does slightly differ from FiveThirtyEight is that teams don’t always gain points for a win. Their model uses the simple 1, 0.5 and 0 point system for actual results and so a team that wins can never lose points. In my system, a team needs to win by a greater margin than we expect to gain points. An issue with FiveThirtyEight’s system that they discuss is autocorrelation, which means that x. They account for this using a Margin of Victory (MOV) multiplier, which essentially scales the k value based upon the margin of victory, so that it is reduced for blowout wins/losses.

In my haste to develop my system, I thought this would be cool to have in my model and so implemented it. While writing this up, I’ve realised that it probably isn’t needed since my calculation for mapping Margin onto a scale of 0 to 1 already “squashes” bigger results. Interestingly however, including it in there does slightly improve my historical performance of predicting results (68.6% correct tips, MAE of 26.8) versus taking it out (67.0% correct tips, MAE of 27.6) over the entire history of the AFL. For continuity, given I’ve included it in my predictions so far this season, I’ll leave it in but I’ll likely revisit it after the season. The equation for including the MOV is below.

$MOV = \log (abs(Margin) +1) \times \frac{2.2}{0.001\times eloDiff + 2.2}$

2. For clarity, A = Lower asymptote = 0, K = Upper asmpytote = 1, B = Growth Rate = 0.05, v = 1, Q = 1, C = 1
3. I’m looking to change this for next year, likely to actually model historical ELO rating differences on probability of winning
4. which I plan to do for my next implementation
5. as far as I know, not empirically tested?

## Round 9 predictions

Last weekend we again managed 6 correct tips, with an MAE of 33.9, bringing our season total to 52 (72%) with an MAE of 31.8. I’m hoping from here on in, we can maintain greater than 70% tipping and get the MAE under the 30 point mark 1

Onto this weekend, it looks like there are two standout games in the Hawks v Swans and the GWS v WB matchup. All 4 teams in our congested top 8, with the Hawks rated slightly higher, and at home, while the home ground advantage giving the Giants a slight edge over WB.

Our ELO ratings again show our clear separation of the Top 8 that we observed last week. This is providing more evidence to the idea that the Top 8 is already set by this point in the year. It also seems like we have a decent drop-off after about 13th, with not a lot to separate at least the bottom 4 (or maybe 5) teams. I’ll continue to monitor these groupings as the year goes on.

It is also interesting that the model still doesn’t rate North Melbourne either! Despite this, we give them a very good chance to improve their winning start to the season over Carlton (76% chance) – you can review the analyses of how good that start is here.

Simulating the rest of the season shows us again that we are relatively confident that the top 8 is already set, with only Port Adelaide (32%) and Melbourne (24%) making the 8 from outside with any regularity. While our model is is giving Geelong and Hawthorn >70% chance of clinching top 4, it has trouble picking the other teams, with essentially an even spread for the remainder of teams 3rd to 7th.

North Melbourne still sits at a predicted 3rd, despite ranking 8th on our ELO rankings. Those banked early wins are important.

I’ve also shown the distribution of wins for each team, with the columns representing groups of 6. Hopefully this shows a little bit that, rather than predicting Geelong to get exactly 16 wins, that is the mean number of wins in our simulations, with a distribution around that number.

It also, I think, graphically represents the idea that, we don’t think we will have two teams on 16 wins, with 6 teams following on 14 wins, but rather that the mean represents the relative confidence we have in that team finishing higher than another team. In fact, if Geelong ends up on 16 wins, it probably won’t even finish 1st, given the high distribution of teams other than Geelong it in the region above 16 wins. I might explore this concept further in a later post but I’ll start to include this plot.

Notes:

1. I already have some thoughts on improvements of the ELO rating system

## The tackle machine

Over the weekend, Tom Liberatore managed to equal the record for the most tackles laid in a game in the history of recording the measure (since 1986 1) with 19. He could have absolutely smashed the record considering he was on 17 tackles at 3/4 time. This is also the second time this year the record has been equalled, after Jack Zieballs 19 tackles in Round 3, with the original record set by Jude Bolton in Round 3, 2011.

As we can see, above, most of the top games in terms of tackles have all occurred since the 2009 grand final. In fact, when we plot the average number of tackles per game across each round since 1986, we see that there is a clear upward trend in tackles per game.

We can also see that the best tackler within a season (essentially the Coleman Medal of tackling, herein referred to “The Lenny” 2) has been increasing each year.

This is similar to the trend others have shown in stoppages, although not quite as dramatic. One complication of tracking changes in tackles over time is that they are a relatively new measure whose definition has become very specific.

A tackle can be defined as using physical contact to prevent an opponent in possession of the ball from getting an effective disposal. If a player has on, two, or more players hanging off him and executes an effective kick or handball, then a tackle will not be awarded.

While we can’t test the theory, one might speculate that the advancements in resources by the official statistics provider, Champion Data, have allowed for better classification of tackles. While the increase in stoppages certainly points towards more successful tackles, some of the increase in tackles may also be down to us getting better at measuring tackles. Hard to measure but fun to speculate!

Going back to the Lenny Medal race for this year, while Liberatore and Ziebell had big one off games, they aren’t leading the race for the top stop. So far this year, that goes to Andrew Swallow with 68 (an average of 8.5 per game). Liberatore is in fact all the way down at 16th with 48 (6 per game), while Ziebell is 26th with 45 (5.6 per game). Impressively, Liam Shiels from Hawthorn is 4th with 58 from only 6 matches, at an average of 9.7 per game.

Where that fits in the overall best season is below. Certainly, Swallow is tracking nicely to come close to breaking his own record, which, if the trend for increases in tackles continues, is probably what we would expect!

Notes:

1. at least on afltables.com
2. Named after the loveable Lenny Hayes, current holder of the ‘Most tackles in a career’

## Round 8 Predictions

Last weekend I managed to tip 6 out of 9, with an MAE of 31.2. A bit better than the first week but also not as well as my models historical performance 1

Here are my Round 8 predictions, based on my ELO predictions 2. I wrote about how there is a big gap currently in my ratings between the 8th placed North Melbourne and the 9th placed Port Adelaide. Interestingly this week, apart from the Friday night matchup between Adelaide and Geelong, the other top 8 teams in my rankings are all playing against teams below them. My ELO ratings do reward ‘better than expected’ performances, so it doesn’t necessarily mean that if each of those teams wins then the gap will widen, but it is an interesting quirk nonetheless.

And finally, I published yesterday my rest of season simulations, but here are those tables for reference here.

Notes:

1. In reviewing the data this week, I found a small bug in my “Upcoming Round Prediction script” whereby I was giving the HGA to the away team, thus overestimating the performance of the Away Team by roughly 16 points. All fixed now, and it wasn’t a bug in my bigger Historical script, or the one I use to update ELO rankings, but it would have improved individual Round tips by 2 in Week six and 2 in Week seven! Annoying.
2. I promise I have started writing the methodology

## Leaping Kangaroos

The Kangaroos, often considered ‘un-sexy’, are putting together a pretty nice season. I’ve discussed previously that early wins in a season is strongly related to wins by seasons end and, although my current ELO rating for North Melbourne only ranks them as the 8th best team 1, their current bank of 7 wins gives them a good base to work off for the rest of the year. Indeed, simulating the season from here on in gives them around a 50% chance of making the top 4.

They are also a good chance to extend their ‘perfect’ start to the season 2 given they play Essendon and Carlton in the next two weeks, who occupy the last and 3rd last spots on my rankings ladder, respectively. Given the strong possibility of a team being on 9 wins and 0 losses, I thought I’d explore where this season fits historically to the ‘best starts to a season’ conversation.

Firstly, I’ve plotted below the distribution of games before we have a team that ‘loses’ its perfect record. In other words, how many rounds did it take until we had no more undefeated teams left in the season. As we can see, for most seasons, this falls within the range of around 4-6 wins.

In fact, across 119 seasons of VFL/AFL, only 26 teams have started the season with a better record than North Melbourne’s current streak of 7. They are shown below.

Clearly, there are three standouts –

• Essendon’s 2000 campaign, where their only loss came against the 6th placed Bulldogs
• St Kilda’s 2009 season, eventually losing to 8th place Essendon and,
• Collingwood’s 1929 season, still the only “perfect” season in VLF/AFL history, albeit from an 18 game season.

Given North Melbournes two lowly upcoming opponents, they should certainly be confident of continuing this run. This week we give them a 73% chance of winning. The biggest stumbling block is obviously Sydney in round 10 (currently rank 3rd on my rankings). You can see the rest of season simulations here.

If they can get over Sydney (and Essendon/Carlton) to get to 10 wins without losing, that will be the equal 8th best start to the season in VFL/AFL history. No-one will be complaining about “unsexy” football then!

Notes:

1. I suspect this is largely due to winning games by less than expected
2. At least in teams of Win/Loss

## Simulating the season

As I’ve promised for a few weeks, my ELO rating system allows me to simulate the season from points in time to assess the chances of various teams finishing positions, based on information we have gathered during the start of the season.

Below, I’ve taken each teams current ELO rating, with their current record, and simulated the season 20000 times. For each match, I use the expected result estimated from the ELO difference between the two teams to draw from a probability distribution around that expected result 1. These simulations are “hot” in the sense that after each simulated match, I update the ELO rankings. I’ll probably write a seperate post 2 on that, but here is FiveThirtyEight’s reasoning on why this is a good idea, which is good enough for me. The nice part about this methodology is that it takes into account a teams draw, their current rating and their current record.

I’m hoping to potentially turn this into an interactive table 3. For now, I’ll update each week with a static image.

One interesting point is the big drop-off in percentages for finishing in the top 8 between 8th placed West Coast and 9th placed Port. It seems, as I wrote about previously, that the final 8 is taking shape already this early on. In fact, this seems quite common amongst those generating rating systems.

If the top 8 stays this tight, then we could be in for a super interesting finals series. That’s if we can get through the dullness of the top eight being set already.

Notes:

1. I believe this is formally known as Monte Carlo Simulation
2. with my ELO system explainer
3. anyone with advice? ShinyApps is something I’ve considered)

## Round 7 predictions

After putting tips out for the first time last week, I actually didn’t get to watch a single game due to being on holiday for the (at least in Queensland) long weekend without phone, internet or TV reception! After a nervous wait, I came back to see my model had a tough weekend – tipping 5 out of 9 with a mean absolute error in the margin of a hefty 50 points. Granted there were a few unexpected results but not what I had envisaged stuck in my log cabin all weekend!

Nonetheless, my season results for the ELO model (understanding I haven’t made these public yet) sit at 39 out 54 (72%) with a mean absolute error in the margin of 33 points. I’m again hoping to turn this into a nice table or chart as the season progresses.

It is with some trepidation then that I release my round 7 predictions, based on my ELO predictions 1.

Here are our current ELO ratings

Notes:

1. I endeavour to write up my methodology over the weekend!

## The round 7 rule?

In an article over the weekend, Rohan Connolly from The Age asked if the finals teams were already set by the end of round 7. He suggested that Round 7 appeared to be some kind of milestone.

Call it the round seven rule if you like, but in a nutshell, if a team wasn’t already in the top eight by then, it was almost certainly not going to be there when the finals began four months later. Just once in that period, in 2005, did the top eight after that arbitrary cut-off point alter by more than one team.

I’ve often heard commentators discuss this and always wondered if the data support it. It makes sense that better teams are higher on the ladder at Round 7 and therefore across a season end up higher on the ladder as well, but as we know, teams have varying difficulties in their draw across a whole season, let alone in the first 7 rounds. It is as uncommon as discussed?

In my post last week about Freo, I showed the above chart to highlight how wins in the first 6 rounds seemed pretty predictive of total wins later on. Having a bad start certainly didn’t exclude teams from having good seasons, it just made it less likely. The Arc further explored this idea, by estimating the predictive strength of a teams winning percentage after a certain number of games on their final winning percentage. As we might expect, it was shown that the further we get through the season, the better a teams record predicts their final record, however early season wins did give a relatively good indication. In fact, the final figure from the The Arc article shows that after round 7, a teams record explains around 60% of the variance in their final record.

To fully explore the “Round 7 rule” as defined by the article in The Age, I decided to focus on team rankings. Exploring the data for changes in rank from 1994-2015 (i.e. when we’ve had a final 8), we find 42 teams that have made the 8 from outside it at round 7, an average of just under 2 per season. The top 20 of those, defined as the biggest change in rankings, are plotted below.

If we take a closer look at the data for each round, rather than just round 7, we can begin to make some inferences about what stage in the season tends to see the ladder “stabilise”. Below, I’ve calculated the mean number of teams that moved into the final 8 after being outside it in through each round (again between 1994 and 2015).

Again, as we should expect, the number of teams entering the top 8 drops off as we progress through the season. We see that Round 7 is in fact at the (arbitrary) point where, on average, we first see less than 2 teams tend to move into the final 8. Another a similarly arbitrary threshold is seen as Round 16, where fall below 1 change on average.

The next two plots attempt to quantify the specific relationship between ladder position (i.e. Rank) of each round and the final ladder position. In the first instance, I’ve used Spearman’s Footrule distance, which is essentially the total, absolute difference in ranks, while in the second, I’ve used Spearman’s rank correlation coefficient, which is similar to a normal correlation coefficient but designed for ranks.

In essence, each gives us an estimate of how similar two sets of rankings are – in our case, how similar is the ladder position at round N from the final ladder position. For these analyses, I’ve used the whole ladder, rather than the final 8, but it didn’t make a huge difference focusing on just the top 8, other than some increased variability. We can again see that as we move through the season, the ladder position begins to become more similar to the final ladder position.

So do we see evidence of a threshold point at round 7 to support the idea of the Round 7 rule? Probably not – we see the each subsequent round provides an incremental increase in our ability to predict the final ladder position. Rather than some big “tipping point”, it is more of a slow burn. In saying that, considering how far out we are from the finals, an average of less than 2 changes, and a Spearmans rho of 0.75 is pretty good. Rather than a “round 7 rule” is probably more likely that good teams tend to be higher placed after Round 7 than bad teams, and good teams tend to finish higher on the ladder by the end of the season.

EDIT: The Arc also posted a similar analyses with similar conclusions. Have a read!

## Round 6 Predictions

While I was hoping to have this site up and running before the season started, my PhD thesis and then full time work got in the way.

Nonetheless, its not too late to start posting predictions! I plan to maybe go back and revisit a priori how my model would have performed in the early rounds but that’s for a later time. You can at least check in on my pre-season rankings.

Here are my ELO predictions for Round 6. At some stage, I plan to turn this into an interactive page, similar to the FiveThirtyEight ones but for now, I’ll just post some predictions.

In future weeks I may do a game by game summary but for now, I’ll make some general observations.

• There appear to be a bunch of close games this weekend, with NM v WB, Rich v PA and Carl v Ess all seemingly tough to pick.
• I’ve written about Freo’s start to the year being so bad – things aren’t looking great for turning that around against an in form Adelaide
• I’m not sure why my model is predicting such a big win for Hawthorn but it will be interesting to see how that one pans out

I should also note that I’m putting both my Margin and Probability predictions into the Monash tipping competition (full disclosure, there are no prizes). I’ll update on how they are going later on.

## Annus horribilis Fremantle

By his own words, Ross Lyons team have are having an Annus Horribilis 1. It doesn’t take a whole lot of in depth data to know that starting the year with 0 wins and 5 losses is bad, and there is no shortage of stories describing just how bad that is.

The Arc shared a great visualisation looking at the progression of wins by minor premiers in the year following their minor premiership. Only Richmond in 1983 has started 0-5 from minor premiers, while the worst final season record is the Kangaroos in 1984, who won 5 games all year.

Where do we think Freo might get to? I thought I’d further explore the Dockers poor start by comparing to some other historical distributions to see just how bad it is. Below I’ve plotted the distribution of games won after 5 rounds, where we see, as we expect, a relatively normal distribution across those 5 games. Approximately 7.9% of teams, regardless of quality, begin a season at 0-5 in the history of the AFL (this drops to around 6% if we just look at 1990 to 2015).

Looking at how these groups of teams finished the season, we can see that early wins (i.e. games won after 5 rounds) seems pretty predictive of absolute final wins by season end. Of course, within those final wins is included the early wins, but removing them and showing ‘relative’ final wins (i.e. the record after round 5) shows a similar pattern.

Focussing just on the top graph, where we have our 0-5 teams, it becomes apparent that very few teams who start 0-5 end up turning their season around. Below, I’ve picked out the best performing teams, in terms of final wins, after starting 0-5 2

The data show that only two teams have won 12 games,often discussed as the finals cutoff 3 after starting 0-5 – Richmond in 1924, who happened to win the final game of the Finals that year but didn’t win the premiership and Collingwood in 1959, interestingly the premiers from the year before.

So, against historical data, Freo has a pretty huge task ahead of them to make finals. This does however fail to take into account many factors. Maybe other teams who start 0-5 are just really really bad teams and Freo isn’t? Perhaps Freo has had a really tough start to the season in terms of opponents (they don’t appear to across the season) compared to the general 0-5 team?

Luckily, we can try to account for such things by using a rating system, such as my ELO ratings, which allows us to simulate expected results based upon the inherit skill and form of a particular team. Below, I’ve simulated a season 10000 times for the 1st 5 rounds and the rest of the season. The 1st 5 rounds are based on our pre-season rankings for Fremantle (we had them 9th) and their draw. We saw them start 0-5 around 3.7% of the time, below the historical average discussed above, and had them on average finishing with 2.7 wins on average.

From here on in (i.e. simulating their season from R6 onwards), we can see that, like historical teams that start 0-5, we don’t give Freo much chance of moving up the ladder. They do seem to fare slightly better than the average 0-5 time however and they do get at least 12 wins 8.7% of the time, which is higher than our historical 0-5 teams. Either our model has been too slow to adjust to Freo’s lack of ability or they aren’t your typical 0-5 team!

A lot of this assumes that Freo will continue to try and win as many games as possible, which may not be the case. Nonetheless, if anyone can turn a team around, and provide them with an Annus Mirabilis one might expect a coach like Ross Lyon, known for getting every inch out of his list, might be able to do it. At least our current rating of Freo (as the 10th best team) gives them a small chance. That is if Ross doesn’t jump ship early!

Notes:

1. much to my dismay, that word doesn’t have the low brow meaning I had hoped – instead it means “horrible year”)
2. In reviewing this graph, it revealed a slight error in my methodology as some teams such as North Melbourne in 2011 didn’t start at 0-5 but in fact had 0 wins after round 5, with a bye. I may try and fix this later on but I suspect it won’t change things a whole lot
3. I sense a good blog post coming!