Messi or Ronaldo? A mathematical argument to settle the football G.O.A.T. debate
It has been long debated passionately who is the greatest footballer of all time. In this post, we try to answer this question rigorously without bias through the lens of mathematics. In particular, we will analyze the intriguing patterns of Messi’s and Cristiano Ronaldo’s goal-scoring patterns.
If you want to learn about the coupon collector’s problem or the Kolmogorov-Smirnov test in an entertaining manner, then read on!
The definition of greatness
In mathematics, we often start with axioms and/or definitions. How shall we define being a great striker in football? Well, first of all, an attacker must be unpredictable and equally dangerous for the opponent's goal at every single minute. Imagine that it would be known that an attacker solely scores between minutes 30 and 35. Then it is straightforward to defend against such a predictable striker: double-team or even triple-team them for that short 5 minutes of the game. Afterward, you can even forget about that particular striker before the 30th minute and after the 35th minute of the game since they are incapable of scoring a goal. Now, we can define more formally what makes a striker excellent.
Definition (perfect striker): a striker is said to be perfect if they can score at every single minute of the game with equal (non-zero) probability. More precisely, the probability distribution of their scoring behavior follows a uniform random distribution.
The definition implies that in our particular example, a perfect striker’s probability of scoring a goal at each minute of the game is 1/90. In contrast, in our example above with the predictable striker, their scoring probability is 0 in the first half an hour of the game and after the 35th minute of the game. On the other hand, they score a goal between the 30th and 35th minute with an 18/90 probability per minute (20%).
Let’s explore how close Messi and Ronaldo are to being perfect strikers!
Goal at every minute!
First, we collected for Messi and Ronaldo the number of goals they scored each minute of the game throughout their careers. Interestingly, at the time of writing, Ronaldo scored at least 1 goal per each minute of the game, while Messi did not manage to score in the very first minute of the game.
## The number of goals the players scored in each minute of the game.
messiGoalsPerMinute = [0,1,5,4,9,3,4,6,5,4,9,9,10,5,11,11,8,9,5,12,9,6,11,7,12,9,7,10,4,9,12,6,12,7,6,9,11,13,8,9,7,12,15,7,22,2,5,3,6,8,10,5,5,5,15,8,7,13,10,10,12,8,11,12,11,6,9,8,10,5,6,9,12,11,14,10,9,14,8,11,5,12,8,6,12,13,14,12,12,45]
ronaldoGoalsPerMinute = [1,6,10,8,2,6,3,5,5,8,5,5,7,11,3,3,5,7,5,6,9,5,15,7,8,11,7,11,5,9,4,6,7,6,7,7,6,7,6,3,7,4,4,11,15,1,7,4,8,11,10,5,7,7,8,4,7,9,6,8,6,6,8,10,10,2,4,13,9,9,6,3,11,5,6,7,9,7,10,9,10,11,7,7,7,5,9,10,12,35]
The above goal-per-minute database is more updated for Messi than for Ronaldo. Hence, we observe more total goals for Messi (817 vs. 663). One can observe in the following figure that there are two outlier data points: one around the 45th minute and one around the 90th minute. This is because all the goals that were scored in the additional time after the 45th or the 90th minute are represented in a single bin corresponding to those two minutes. Interestingly, both players tend to score more goals in the second half. More precisely, the expected times of the goal distributions are 50.21 for Messi and 48.95 for Ronaldo.
Let’s consider the following intriguing question: assuming that a striker scores a goal in each and every minute with equal probability, how many goals do they need to score in order to have at least one goal scored every minute? This question is closely related to the coupon collector’s problem.
The coupon collector’s problem is as follows. One can buy items that inside them contain coupons. Altogether there are n different coupons one wants to collect. The coupon collector’s problem asks how many items one needs to buy on average to collect all the coupons assuming that every coupon is placed with equal probability in each item. A simple argument shows that when there are n elements, then we need to buy, on average nlog(n) items to have drawn each coupon at least once.
This quantity for the expected number of goals to reach at least one goal per minute for a 90-minute football game is 404.98. Guess what! Ronaldo’s 405th career goal has accomplished the achievement of notching at least one goal at every minute. This is just exactly what the simple mathematics of the coupon collector’s problem has predicted.
This simple statistic might hint that, indeed, the distribution of these footballers' goals per minute might follow a uniformly random distribution. Now, let’s get back to our original question. According to our definition, how close Messi and Ronaldo are to being perfect strikers?
How close to being perfect?
First, we observe that in our small database, we have different total numbers of goals scored for each player. To make them comparable, we normalize these lists to obtain a probability distribution. The next figure shows Messi’s and Ronaldo’s scoring probability distributions. One can see that both of them are “pretty close” to the black horizontal line representing the perfect scoring distribution: the uniform random distribution. But which of them is closer to the uniform random distribution? Put differently, which of them is closer to being a perfectly unpredictable and dangerous striker throughout the whole game?
Since we want to settle the football G.O.A.T. debate, we need to answer the following mathematical question: how close each player’s goal distribution is to the uniformly and randomly distributed goal distribution? As shown in the figure above, we cannot hope that either of the player’s goal distributions would match perfectly the uniformly random distribution. Rather, it only makes sense to ask how close their goal-scoring distribution is to the uniformly random distribution.
A valuable statistical tool called the Kolmogorov-Smirnov test (KS test) measures just that. More precisely, the KS test measures the distance between two probability distributions. We take the cumulative distribution functions (CDF) of each of our analyzed probability distributions and measure the distance between them at each point. The KS statistic is the maximum of these distances (in our example, we have a finite, discrete probability distribution, so we can take the maximum. The usual definition would use the supremum).
rng = np.random.default_rng()
print(stats.kstest(messiCDF, stats.uniform.rvs(size=sum(messiGoalsPerMinute), random_state=rng), N=sum(messiGoalsPerMinute)))
print(stats.kstest(ronaldoCDF, stats.uniform.rvs(size=sum(ronaldoGoalsPerMinute), random_state=rng), N=sum(ronaldoGoalsPerMinute)))
## The result
KstestResult(statistic=0.09948320413436693, pvalue=0.3718889719826961)
KstestResult(statistic=0.07003519356460533, pvalue=0.8030016140819126)
The verdict of the KS test is that Ronaldo’s goal-scoring distribution is closer (KS statistic 0.070) to the uniformly random distribution than Messi’s (KS statistic 0.099), also with higher confidence (p-value). Hence, we just showed, in a rigorous way, that Ronaldo’s goal-scoring distribution is closer to being perfect than Messi’s.
Epilogue
Obviously, deciding who is the football G.O.A.T is much more a social construction than a mathematical debate. One would need to factor in many more aspects of greatness that are not necessarily measurable, for instance, cultural impact or impact on the game. Nevertheless, in this post, we tried to analyze Messi’s and Ronaldo’s goal-scoring distributions from a mathematical point of view and answered a few intriguing questions.
If you want to see another mathematician’s view on settling the football G.O.A.T. debate, watch this video by Tom Crawford.