I play golf in Northern Ireland and also in Spain. In both there are various season long competitions designed to find the “best golfer”, usually known as an Order of Merit (OoM). During the Covid pandemic I even ran a simple Order of Merit for friends in Northern Ireland as a distraction during those difficult times. It seemed to work ok! However, in one place where I play we are about to start the third different system in three years. Why is it so hard to find a system that pleases players and organizers? What’s actually meant by “best golfer”?
So I began to think about the topic in a bit more detail and dug into the literature. The mathematics are quite interesting and the subject is really quite complicated!
Here are some musings! I’ve tried to avoid getting too mathematical and endeavoured to focus on the general issues and themes.
In no way is this a critique of how various groups have run OoM but rather a consideration of just how difficult it is to run such things!
It’s really not as simple as you might think!
If I have a hope it’s that this essay might provoke further thought and discussion.
Order of Merit: A Simple Idea, A Complex Reality
A golf order of merit sounds, at first hearing, like a simple administrative device: a season-long table that sorts players from best to worst. A way of finding the best golfer.
In practice, it is something much richer and more complicated. It sits at the intersection of sport, statistics, and human psychology. It tries to compress a seasons worth of variable performances, played on different days, in different conditions, by people with different levels of commitment and skill, into a single, robust and authoritative ranking.
The moment one begins to design such a system, it becomes clear that the question is not merely how to total scores, but what one means by “best,” and how fairness is to be interpreted in a setting where participation is uneven and performance, the basic data, is inherently very noisy.
It’s really not easy.
Comparability and the Limits of Stableford scoring
Golf is a particularly interesting sport in this respect because it offers, through formats such as Stableford scores, a numerical score that appears directly comparable across rounds. Unlike match play, where outcomes are binary, Stableford produces a points total that seems to quantify performance in a continuous way. Yet even here, comparability is more fragile than it appears.
A round of 36 Stableford points may represent a steady, very competent performance on a calm summer day against a modest field, or it may be an outstanding effort in difficult conditions against strong competition. In other conditions 30 points might be quite exceptional on a cold day with blustery strong winds and squally showers. It’s not just the quality of the field that matters but the size of the field and the conditions.
Do we too look at the scores themselves or to the ranking that the scores create in the field. Golf is inherently about competition so looking at who won, who came second, third and so on seems to be at the heart of any valid system.
Yet the raw number (whether it be Stableford points or position ranking) conceals context. Any order of merit that relies purely on these numbers must therefore confront the tension between simplicity and realism: the more one tries to adjust for context, the more complex the system becomes, and the harder it is for participants to understand and trust.
What Is Being Rewarded?
At the heart of the problem lies a fundamental ambiguity about what the order of merit is intended to reward. One interpretation is that it should identify the most skilful golfer, the player who, on average, performs at the highest level. Another is that it should reward competitive success, the player who most often beats their peers. A third interpretation, perhaps the most intuitive in a club setting, is that it should recognise the best “season,” a blend of performance, consistency, and engagement.
These are not the same thing.
A player who produces a few brilliant rounds and many mediocre ones may have a high peak ability but poor consistency.
Another who reliably scores in the mid-30s may rarely win but will often place highly.
A third who plays frequently may accumulate many good results simply by being present.
Another option is to determine the most improved player over the season.
Any system must decide, explicitly or implicitly, which of these qualities it values. The organisers need to be clear on the goal. What are you actually trying to achieve? Unfortunately experience tells me it’s rare for organisers to have absolute clarity on this crucial point.
Average-Based Systems: Consistency as Merit
One common approach is to base the order of merit on average performance.
This might involve taking the mean Stableford score across the season, or more typically the mean of a player’s best subset of rounds. The attraction of this method is that it speaks directly to consistency. By focusing on the best subset (say the best sixteen rounds in a thirty-six week season) it mitigates the effect of poor performances and of missed weeks, while still requiring a substantial body of evidence.
The resulting figure has a clear interpretation: it is an estimate of the level at which a player performs when they are at or near their best. Such a system tends to favour steady, reliable golfers who rarely have disastrous rounds and who can repeatedly produce solid scores.
Yet this apparent fairness conceals certain limitations. Averaging, even when restricted to a player’s best rounds, does not reward winning per se. A player who finishes second or third every week with scores in the mid-30s may outrank someone who occasionally produces exceptional scores in the high 30s or low 40s but also has off days. In a competitive environment, this can feel counterintuitive. Sport, after all, is often about winning, not merely performing consistently. Or is it? Clarity is needed.
Moreover, average-based systems treat all rounds as if they were equally meaningful, ignoring differences in field strength. A 36-point round in a weak field is not the same achievement as a 36-point round that wins a strong competition, yet the arithmetic mean cannot distinguish between them. The size of the field has a similar impact.
Another source of complexity is the impact of varying weather and course conditions. A spectacular round in poor weather may have a relatively low Stableford score. Similarly a tough course set up can have the same effect. While in theory the WHS has the PCC system for mitigation of conditions that should counter this, it’s well recognised that this is arbitrary and, despite protestations to the contrary from the governing bodies, is neither robust nor trusted. Actually this observer thinks it’s junk!
Position-Based Systems: Rewarding Outcomes
An alternative philosophy is to base the order of merit on finishing positions.
Each week’s competition is treated as a ranking, and points are awarded according to position. The simplest version assigns a descending sequence of points, first place receives the most, second slightly fewer, and so on. More refined versions weight the points by field size, so that winning a large competition is more valuable than winning a small one.
This approach aligns more closely with the competitive nature of sport. It rewards players for beating others, not merely for posting good scores in isolation. It also introduces a narrative element to the season, as players accumulate points and move up or down the table in a way that mirrors professional tours (see Addendum 1).
However, position-based systems introduce their own complexities. They are sensitive to the shape of the points scale. A steeply declining scale, where first place is heavily rewarded relative to lower positions, emphasises winning and can produce a leaderboard dominated by players with just a few victories. A flatter scale rewards consistency and may produce a champion who rarely wins but is consistently near the top. Neither outcome is inherently right or wrong, but each reflects a different conception of merit.
Furthermore, position-based systems can inadvertently reward mere attendance. A player who competes in many events may accumulate a large total of moderate points, potentially outscoring a more skilful player who participates (for whatever reason) less frequently but performs better when they do. This raises the question of how to balance participation and performance.
Participation and Qualification
The issue of uneven participation is central to any club-level order of merit. In a typical weekly competition, not all players attend every event. Some may play nearly every week; others may appear sporadically. If total points or total scores are used without adjustment, frequent participants clearly gain a structural advantage.
One way to address this is to impose a minimum number of qualifying rounds and to consider only a fixed number of a player’s best results. For example, idea of “best sixteen” in a forty week season is a practical embodiment of this principle. It ensures that players must engage to a meaningful extent while preventing those who play more often from gaining an undue advantage simply through volume. At the same time, it introduces strategic considerations: once a player has accumulated sixteen strong results, additional rounds may not improve their standing unless they displace a weaker score. But the numbers used need justification!
Variability, Reliability, and Performance
Even with such mechanisms, the question of fairness persists. Consider the role of variability. Golf scores are subject to fluctuations arising from weather, course conditions, and the inherent variability of human performance. A player’s observed scores are therefore samples from an underlying distribution of ability.
An order of merit attempts to infer that underlying ability from a finite sample. Statistical thinking suggests that both the mean and the variance of a player’s scores are relevant. A player with a slightly lower average but very low variability may be more “reliable” than one with a higher average but large swings between excellent and poor rounds. Whether the system should reward reliability or peak performance is a matter of design philosophy. This is rarely made explicit in the design of an OoM.
Adjusting for Context: Relative Performance
A more sophisticated approach attempts to adjust for context by normalising scores relative to the field. Instead of using raw Stableford points, one can consider how a player performed compared to the average and spread of scores in that particular competition. A round that is significantly above the field average is treated as a strong performance, regardless of the absolute score.
This has the appealing property of accounting for differences in difficulty between events. A blustery winter day, when scores are generally low, does not penalise players simply because the absolute numbers are smaller. Similarly, a day with unusually high scoring does not inflate the perceived quality of performance. Such methods bring the analysis closer to statistical models used in professional sports, where performance is often measured relative to contemporaries. In addition, it raises the question of whether post-game adjustment by the PCC should be applied.
The drawback of these methods lies in their opacity.
While the mathematics may be straightforward to those familiar with statistical concepts, it can be difficult to communicate to a broader audience.
In a club environment, transparency and trust are paramount. Players want to understand how their ranking is determined and to feel that the system is fair. A method that relies on abstract quantities may be viewed with suspicion, even if it is objectively more accurate. There is therefore a trade-off between statistical sophistication and practical acceptability.
Beyond Mathematics: Social and Behavioural Effects
Beyond the technical aspects, there is a social dimension to the order of merit. It shapes the culture of the competition. A system that heavily rewards attendance may encourage regular participation and foster a sense of community, but it may also disadvantage those with less availability. A system that focuses on peak performance may create excitement and highlight outstanding rounds, but it may reduce the incentive for consistent engagement.
The choice of system can influence how players approach each round. Do they play conservatively to secure a solid score, or do they take risks in pursuit of a winning performance? The scoring structure subtly guides behaviour.
Narrative, Engagement, and Enjoyment
There is also the question of narrative and engagement. Sport is not only about fairness; it is also about enjoyment. A well-designed order of merit creates a sense of progression and suspense. Players should be able to see how they can improve their standing and what is at stake in each round.
Systems that are too static, where rankings change little from week to week, can feel dull. Conversely, systems that are too volatile may appear arbitrary. Striking the right balance is as much an art as a science.
Thresholds and Eligibility
The idea of qualification thresholds, such as requiring a minimum number of rounds to be eligible for the order of merit, is another important design choice. It ensures that the ranking is based on a sufficient sample of performances, reducing the influence of outliers.
However, it also creates a boundary that can feel arbitrary. A player who has played fifteen rounds and performed exceptionally well may be excluded, while another who has played sixteen rounds with slightly weaker results is included. Such thresholds are necessary but must be communicated clearly and justified as part of the overall structure.
Hybrid Systems: A Pragmatic Compromise
In practice, many order of merit systems adopt a hybrid approach that combines elements of the methods described above. For example, they may rank players by the average of their best sixteen rounds but use the number of wins or top-three finishes as tiebreakers. This recognises both consistency and competitive success.
Alternatively, they may use a points-based system but consider only a fixed number of the best results, thereby limiting the influence of attendance. These hybrid systems reflect a pragmatic compromise. They acknowledge that no single metric captures all aspects of performance and that a combination of measures may provide a more balanced assessment.
The Number of Counting Events
Perhaps the most important structural choice is the number of counting events. This is not a minor technical detail but a defining feature of the system.
The number of counting events in an order of merit shapes what the competition ultimately rewards. It’s absolutely crucial!
Selecting 12, 16, or 20 scores from a season of perhaps 40 events determines how much weight is given to peak performance, consistency, and participation.
With fewer counting rounds, the system focuses on a player’s very best golf, largely ignoring poorer performances and allowing a relatively small number of appearances to define a season. This tends to favour players capable of occasional outstanding rounds and produces a more volatile, dynamic leaderboard.
As the number of counting rounds increases, the emphasis shifts. At around 16 events, the system begins to balance selectivity with representativeness. A player’s ranking reflects a substantial portion of their season while still allowing weaker rounds to be discarded. Increasing the count further, to 20 events, brings most of a player’s active season into consideration. Poorer rounds begin to matter, and the ranking increasingly reflects consistency and commitment.
In effect, the choice of counting events defines the character of the competition. Lower numbers reward brilliance; higher numbers reward reliability. Changing from 12 to 16 to 20 does not simply adjust the system—it changes what “merit” actually means.
The impact of field size
In a points-based order of merit, field size introduces an important question of fairness. Beating forty players is not the same achievement as beating ten, yet a simple points table treats both victories equally.
To address this, some systems scale points according to the size of the field. The simplest approach is to apply a multiplier based on the number of competitors, so that larger fields yield proportionally more points. A more moderated version uses a function such as the square root of the field size, which increases rewards for bigger competitions but avoids allowing very large fields to dominate the entire season. In both cases, the intention is the same: to reflect the greater competitive challenge of outperforming a larger group.
However, this introduces a further design choice about how strongly field size should influence outcomes. A direct linear multiplier can overweight occasional large events, making them disproportionately decisive, while gentler scaling (such as square root) preserves the distinction without overwhelming the rest of the season. There is also a question of transparency. While the idea is intuitive, the exact calculation may feel less obvious to participants, and clarity remains essential for trust. As with many aspects of an order of merit, the issue is not whether to adjust for field size, but how far to go, balancing fairness, simplicity, and the overall shape of the competition.
Modelling is the only way!
Having read widely and considered the issues it strikes me that the only way to decide if a particular system meets the needs of a particular group is to model the system with real world data. Ask the question “does the system you’re going to use give a sensible, and defensible, result” and also “how might that result be perceived by the whole group?“
A lot of trouble you might say! But the confidence of the players in the group is obviously important so ironing out issues before implementation seems sensible. Moreover if historical data is available then modelling using a suitable spreadsheet shouldn’t be onerous.
If you’re interested in the key issues in setting up a golf Order if Merit this PDF has a summary.
Conclusion: Defining Excellence Over Time
Ultimately, the design of an order of merit is an exercise in making values explicit. It forces a club to decide what it wishes to reward and to accept the consequences of that choice. There is no universally correct solution, only solutions that are more or less aligned with the goals and culture of the group.
The key is to be clear about the objectives, to choose a method that is consistent with those objectives, and to ensure that it is transparent and robust.
What begins as a simple desire to rank golfers thus becomes a discussion of broader issues in measurement and evaluation. The order of merit is not merely a table of numbers; it is a reflection of how a group defines excellence over time.
In the end, perhaps the most important quality of any such system is that it commands the confidence of those who participate in it. If players feel that their efforts are recognised appropriately, the precise details become less critical. The system will have fulfilled its purpose, not only by identifying a champion, but by enhancing the experience of competition and the shared endeavour that makes club golf so rewarding.
I’ve not discussed separation of ties, of equal performance. How do you do that fairly? It’s an enormous subject to which I shall return in the future!
What I’ve learned in researching this topic is that there being three different systems in three years is testament to the difficulty in setting up a robust Order of Merit system.
That I once naively thought that a simple average based system was adequate was very simplistic. Unbelievably so!
It’s really hard to find a fair and robust method.
It needs a lot of thought!
Modelling the options before implementation seems sensible!

Addendum 1: what about the professional golfing arena?
Any discussion of an order of merit is illuminated by the systems used in professional golf, notably the FedEx Cup and the Race to Dubai. Both are points-based rankings, but each reflects different priorities in balancing fairness, consistency, and drama. Both have had their fair share of criticism!
The FedEx Cup is deliberately designed to create a compelling season-long narrative. Players accumulate points through the year, but the introduction of playoffs—with points resets and a staggered final—means the ultimate winner is determined as much by late-season performance as by overall consistency. This sacrifices some statistical purity but ensures engagement and a decisive climax. Nevertheless, the “climax” involves a play-off series outwith the season long narrative.
The Race to Dubai places greater emphasis on cumulative performance, though it still weights key end-of-season events heavily to preserve tension. The result is a system that more closely reflects season-long excellence while still allowing movement at the top late in the year.
Both systems also recognise that not all events are equal, assigning more points to stronger fields and more prestigious tournaments. In doing so, they address a problem often overlooked at club level: the context in which performances occur.
Over and above those ranking systems there is the Official World Golf Ranking system (and similar systems for elite Amateurs). This is complex points based ranking system and just like the FedEx Cup and Race to Dubai has had a lot of criticism! The criticism that the professional systems have received underscores the difficulty in finding robust methods of ranking golfers.
The broader lesson is that no order of merit system is without issues!
A club system need not replicate the complexity of the professional game’s systems, but it faces the same underlying question: whether to reward consistency, peak performance, or competitive success, and how to balance those aims in a way that remains transparent and credible to all the players.
Tricky!
Very tricky!
Addendum 2: Further Reading
The design of ranking systems has been widely studied across statistics, economics, and sports analytics. The following selected works provide useful perspectives on the issues discussed in this essay and its addenda.
Arrow, K.J. (1951) Social Choice and Individual Values. New York: Wiley. A foundational text demonstrating that no ranking system can satisfy all fairness criteria simultaneously, underscoring the inevitability of trade-offs in any order of merit.
Barrow, D., Drayer, J., Elliott, P. and Gaut, G. (2013) Ranking rankings: an empirical comparison of the predictive power of sports ranking methods, Journal of Quantitative Analysis in Sports, 9(2), pp. 187–202.
Bradley, R.A. and Terry, M.E. (1952) Rank analysis of incomplete block designs: I. The method of paired comparisons, Biometrika, 39(3/4), pp. 324–345. Introduces a model for inferring relative strength from pairwise comparisons, offering a conceptual bridge to interpreting golf competitions as networks of indirect head-to-head results.
Elo, A.E. (1978) The Rating of Chessplayers, Past and Present. New York: Arco Publishing. Presents a dynamic rating system for estimating underlying ability, providing a useful contrast to cumulative or position-based approaches in golf.
Kondratev, A.Y., Ianovski, E. and Nesterov, Y. (2019) How should we score athletes and candidates? Geometric scoring rules, Management Science, 65(5), pp. 2327–2340. Explores how different points structures reward different competitive behaviours, directly informing the design of position-based scoring systems.
Massey, K. (1997) Statistical models applied to the rating of sports teams. PhD thesis, Bluefield College. Develops performance-based ranking methods that consider outcomes relative to opponents, relevant to golf when comparing scores against the field.
Ochieng, P.J., Kirui, C. and Nassiuma, D. (2022) A forward-looking approach to compare ranking methods for sports, Information, 13(5), 232.
Saari, D.G. (1998) Connecting and resolving Sen’s and Arrow’s theorems, Social Choice and Welfare, 15(2), pp. 239–261. Provides insight into positional aggregation methods, clarifying how different scoring schemes can lead to different overall rankings.
Scheffer, M., van de Leemput, I.A., Weinans, E. and Bollen, J. (2016) The rise and fall of rationality in sports rankings, EPJ Data Science, 5(1). Examines the temporal dynamics of rankings, highlighting the balance between stability and responsiveness over a season.
If you’ve found this post of interest please share it with others and subscribe to Craigavad miscellany.
Please add a comment.