How the Parity predictions simulator works

A few caveats right off the bat!

  • This is Parity, the league where everything's made up and the points don't matter. Please don’t take any of this too seriously!
  • I am not a real-life programmer (I only play one on the Parity Slack channel). There are better ways to do what I’m doing and, if you’re someone who knows how, you should definitely fix it for me.
  • Do not actually gamble on recreational ultimate frisbee games.

With that out of the way, let’s answer all of your burning questions about Parity prognostication!

Where did the simulator come from?

A bit of context: after leagues were canceled in March 2020, I built a program using Excel VBA macros to simulate Parity games based on historical data. We used the program to live stream the remainder of the (virtual) season, which you can watch here or read about here.

After the season ended I rebuilt the simulator (still in Excel) to be more efficient and flexible, while adding in some handy new features like team editing and batch processing. I am now repurposing that program to make predictions about upcoming Parity League games. This is a totally different (and much more robust) approach than I’ve used to make predictions in the past.

Just tell me how it works already

Essentially, we input two rosters of real world teams and the simulator plays out a full, virtual game of ultimate between them where the outcome of each “event” (pass, block, etc.) is randomly determined. However, that randomness is weighted according to historical performance data.

So, for example, if a player has completed 90% of the passes they attempted in the past, then every one of their pass attempts in the simulation has a 90% chance of being completed.

Specifically, these are the data inputs we need for each player, as well as the question that it allows us answer:

  • Pickups per point played: after a pull, goal or turnover, how likely is it that this player picks up the disc to start play?
  • Completion rate: if this player attempts a pass, how likely is it to reach its target without being turned over or blocked?
  • Catches per point played: how likely is it that this player is the target of a pass?
  • Catch rate: if a pass is completed to this player, how likely are they to drop it?
  • Assist rate: if this player successfully completes a pass, how likely is it to be in the end zone for a goal?
  • Goal rate: if this player successfully receives a pass, how likely is it to be in the end zone for a goal?
  • Defensive impact: how likely is the other team to turn the disc over if this player is on the field?
  • Blocks per point played: if the other team turns the disc over, how likely is it that this player is responsible for a block?

For each game event, the simulator rolls some dice to determine the answer to each of those questions before it moves on to the next event. Each event has a random duration (e.g., completed passes take between 1 and 10 seconds). Once “full time” is reached, the game ends and the results are compiled.

How do you turn simulations into predictions?

For each week of games I run each matchup through the simulator 200 times. You could do more, but from my testing 200 is enough for statistical validity (i.e., you’ll get essentially the same result if you do it twice).

The predicted win percentage is based on the outcomes of those games. For example, if a team wins 140 out of 200 simulations I say they have a 70% chance to win that week. If the win percentage is less than 52.5% I call it a toss-up.

The score prediction is composed of each team’s average score over those 200 simulations.

I still don’t get it… give me an example!

Let’s look at a game from week four of the current season between Team Steve and Team Yannick.

First, the simulator randomizes the starting lines and flips for pull. Team Steve receives with Michelle Warren, Melissa Jess, John Haig, Nicholas Aghajanian, Mark Carlson and Ariel Grostern on the field. Who picks it up?

The pickups per point played for each of these players are, in order: 0.33, 0.08, 0.19, 0.03, 0.18 and 0.09. We convert that to a percentage chance, which works out to 37%, 9%, 21%, 3%, 20% and 10% respectively. The sim rolls the dice and determines that Mark picks up the disc in this instance. Over the course of the game though, we can expect Michelle to pick it up about a third of the time. The more games we simulate, the closer we get to that probability.

So we know who has the disc. Now we do a similar process to determine the next receiver. Mark has five possible targets: Michelle, Melissa, John, Aggy and Ariel. Their catches per point played are, in order: 0.87, 0.56, 1.17, 0.73 and 0.91. Therefore, we calculate the likelihood of each player being the target of Mark’s throw at 21%, 13%, 28%, 17% and 21%, respectively. The sim rolls the dice and determines that Aggy is the target.

Is the pass on target? If not, is it blocked (and by whom)? If it hits the target, is it caught or dropped? If caught, is it in the end zone? In each case, we roll the dice and check against historical probabilities to determine the outcome of the event and, ultimately, the game.

In 200 simulated games between these teams, Team Steve won 108 times, Team Yannick won 71 times, and they tied 21 times (during playoffs we can disable ties). Steve’s average score was 18.7 and Yannick’s was 17.7. Taken together, we can interpret these results to mean Steve has a 59% chance of winning with a predicted score of 19-18. (Although there are other valid interpretations, especially if you account for ties differently.)

In real life, the game ended 24-19 for Steve. Was the prediction “correct”? The margin of victory was larger than predicted, but it was well within the range of simulated outcomes. For example, that 200 game sample included results of 24-16, 24-18 and 25-14 in Steve’s favour. So on the whole I’d argue the simulator did a pretty good job of anticipating the most likely outcome, which Steve’s team overperformed by an amount that was within the margin of error.

The simulator will often predict one or two-point wins even though the typical margin of victory in a Parity game is bigger than that. That’s because the predicted score is (effectively) the median outcome. It doesn’t mean a bigger margin is unlikely, just that this score is the middle value among all likely outcomes.

Can you predict individual player performances?

Yes! Since we simulate each individual event we can reconstruct a player-by-player stat line like a regular Parity game.

There’s much more variability in player performances than there is in team performances, but the sims still tend to put us in the right ballpark for most players.

For example, in 200 simulated games between Team Yannick and Team Steve, Maggie Musclow scored an average of 2.6 goals, 1.4 assists and 2.5 throwaways for $63k salary. In the real game she had 4 goals, 1 assist and 1 throwaway for $70k salary. That means Maggie performed a bit better in week four than we would expect based on the first three weeks of the season, but her performance was still within the range of likely outcomes.

If math is perfect, why do you get predictions wrong?

There are three main reasons why the simulator gets things wrong. The first two are boring cop-outs:

  • Real-world randomness: Life is unpredictable and making predictions about the future in any domain is notoriously fraught. If we can’t predict snow tomorrow we can’t predict an 8-goal breakout performance next week.
  • Imperfect stats: The numbers we track tell an abstracted story about the players on the field. The simulator can only do so much with the available data, especially early in the season when there’s a lot more variability.

Blah blah blah. The real problem and the source of my endless frustration? Missing players and substitutes. The simulator assumes that all rostered players will actually show up and play an equal number of points, which never happens and is very hard to work around. In a league where teams are designed to be as fair as possible, missing even one player can make a huge difference for balance.

I am satisfied with this explanation and bow down before our simulator overlords

All hail the simulator! (But seriously there are a lot of details I left out for brevity and some contestable assumptions I’m making. Please feel free to inquire and/or debate below.)

If you’re interested in following along and/or heckling my predictions, I’ll be posting them every week in the #parity-league Slack channel. If you’re interested in the full simulator results, I will make them available every week at https://bit.ly/ParityPredictions. And if you’re interested in the simulator itself, please get in touch and I’d be happy to share my unstable abomination of Excel macros.

Hadrian Mertins-Kirkwood's picture

At Al's suggestion, I reran the Yannick vs. Steve game using the rosters that actually played. Fortunately, there was only one difference: Owen Daigeler subbed for Dave Townsend. The other 23 players were the same.

The result? It made a big difference.

Instead of Steve being the favourite with a 59% chance to win with a predicted score of 19-18, the change meant Steve had a 72% chance to win with a score of 19-16. I've added the summary to the public folder so you can see the difference.

Although the total number of goals in the sim was too low, the +3 margin was much closer to the +5 margin in the real game. It was a more accurate prediction overall.

What's extra interesting is the fact that Yannick's team performed worse even though (simulated) Owen played better than (simulated) Dave based on their individual stat lines. Something about having (simulated) Owen on the field made the rest of his team score fewer (simulated) goals than when (simulated) Dave was on the roster.

I think this simulator is an absolute work of art! Seeing how I'm barely able to add up numbers in a column using excel, I'm amazed that all of this was done using a spreadsheet and wizardry. Kudos to you Hadrian.

One question I'm very curious about is the impact of line composition on the outcome of a game. Assuming you have a full roster and are always playing 1 point on 1 point off, you end up with two separate lines that will play together the whole game. When thinking of how to divide my team in two lines, I'm often faced with some uncertainty as to how to go about it. Specifically, is it better to stack one line that will score say 95% of the time while the other line will score 50% of the time when starting on O or have two more balanced lines that will convert roughly 70% of the time? In your simulator you run random lines, but in my experience lines are usually far from random and some thought has been put behind it, even if it's a simple as "split the handlers".

I'm sure there is some way to optimize your lines to maximize chances of winning, but it's not always obvious. I'd be very curious to see what the impact on the expected results would be in a scenario with even lines vs a scenario with uneven lines (you could use total salary as a metric for how even the two lines are)

Hadrian Mertins-Kirkwood's picture

Thank you for the kind words!

Line composition definitely has an impact on team performance and, in practice, it's one of the least random factors in a Parity game. The simulator's logic is that by randomizing the lines between each game everything will balance out in the end, but this is rarely how games are played (at least in cases where a team has all 12 players).

I don't think it would be too hard to generate pseudo-optimized lines instead of fully random ones. You could use a simple formula to guess which players are handlers based on pickups/completions and then split them up evenly, as you suggest. There could still be a random element but at least you would never be totally unbalanced.

A more interesting application of the simulator (and maybe this is what you're getting at) would be to derive optimal lines based on simulated results. Basically, you'd simulate a certain amount of games for each possible combination of lineups and then see which performs best. Technically speaking I don't think it would be that difficult! Unfortunately, by my math there are 210 possible line combinations with a full roster, so you'd need to run a lot of sims to get a statistically valid answer.

Edit: math. There are 6 possible combinations of 2 women from a roster of 4. There are 70 possible combinations of 4 open players from a roster of 8. That makes 6*70=420 possible line combinations, but we divide that in half because you have two lines in each game.

Hadrian Mertins-Kirkwood's picture

For anyone following along in my quixotic quest to build the perfect Parity simulator, I have added a line-by-line breakdown to each sim which you can check out in the most recent PDFs.

It's sort of cool but not especially useful yet because (as noted above) there are so many possible line combinations that there aren't enough data for each line. Despite my best efforts to optimize the program, it still takes about 1 second to run a sim, so doing 200 takes ~3 minutes. Adding in the line analysis more than doubles the typical runtime to around 7-8 minutes. If I wanted to get a good sample for each possible line-up I would need to run each matchup through the simulator for literally hours.

That being said, I'm working on implementing Yannick's suggestion to compare two line combinations head-to-head, which will be much less processor-intensive and may actually provide some interesting insights!