Why I like STAR voting: winner selection

This blog post is the fourth in a series of posts about STAR voting. If you haven’t read the previous entries, I recommend you start with the first entry before reading this one.

Previously I explained the importance of pre-election polls and the advantages that STAR polls have over other types of polls. I mentioned at the end that many of the properties that make STAR a good method for polling also make it a good method for choosing winners, and in this post I will justify the claim that STAR voting picks good winners. But first, we’ll have to discuss what it even means for a winner to be good in the context of single-winner elections. There are two major schools of thought on this, majoritarianism and utilitarianism, and I will start with the former.

Majoritarianism says that a candidate is good if they are preferred by over half of the voters, i.e. if they have majority support. This sounds pretty straightforward until you consider just what it means to be supported by a voter. Does being supported mean being that voter’s first choice? If so, then it’s impossible to guarantee a majority winner, and most competitive elections with more than two candidates—the elections where the choice of voting method matters most—won’t have a majority winner. Thus, most majoritarians turn to the notion of a pairwise majority instead.

A pairwise majority occurs when a majority of voters would prefer one candidate over another in a head-to-head election. This means that you can guarantee a pairwise majority by first eliminating all but two candidates, then holding an election just between those two remaining candidates. Well, almost. See, it’s also possible for a voter to be indifferent between the two candidates, which means you could have an election where, say, 49% of voters prefer the first candidate, 48% prefer the second candidate, and the remaining 3% are indifferent between the two. Thus, the type of majority that can be guaranteed is a pairwise majority among the voters who express a preference between the two candidates.¹

Since this is the case, you may be unsurprised to learn that this is the type of majority people mean when they say that a voting method “guarantees a majority winner”. For example, a top two runoff system first eliminates all but two candidates, then picks the candidate supported by a majority of the voters who showed up to express a preference between the two candidates in the second round. Voters who show up to the first round but not the second aren’t counted as part of the group from which majority support must be earned, and neither are voters who do show up but choose to vote for a write-in candidate (which is still allowed in many runoffs).

Single-winner ranked choice voting has the same type of guarantee. This method elects a candidate once they have majority support, but that majority is found via candidate elimination rounds, and it only considers the ballots that expressed a preference between the non-eliminated candidates. If all the candidates that were ranked on a given ballot have been eliminated, that ballot is exhausted and no longer counts as part of the group the majority must come from.

Thanks to the automatic runoff, STAR voting can also ensure that the winner has this type of majority support. In fact, it does this explicitly. STAR counts every ballot as either a vote for the first finalist, a vote for the second finalist, or a vote of no preference; the winner is the finalist with a majority of the ballots that were not votes of no preference, which is exactly the type of majority that can be guaranteed.

Personally, I am not much of a majoritarian. I find utilitarianism to be much more compelling. Utilitarianism essentially states that the best candidate is the one with the most total support, or equivalently, the greatest average support. Unlike majoritarianism, this takes into account the strength of voters’ preferences. A classic demonstration of this difference is the pizza scenario.

Suppose you and a pair of friends are looking to order a pizza. You, and one friend, really like mushrooms, and prefer them over all other vegetable options, but you both also really, really like pepperoni. Your other friend also really likes mushrooms, and prefers them over all other options, but they’re also vegetarian. What one topping should you get?

Majoritarianism answers with pepperoni and justifies this decision by pointing out that 2 out of 3 people prefer it to mushrooms. On the other hand, utilitarianism notices that those 2 people only have a weak preference for pepperoni while the 3rd has a very strong preference for mushrooms, and so it answers with mushrooms. Multiple surveys have shown that under this scenario, most people will choose to side with utilitarianism.² The main argument in favor of ignoring the intuition this scenario reveals is that it occurs in a context that is very different from politics. The counterargument is that creating a scenario in a political context biases people who have been taught that “democracy” and “majority rule” are synonymous but would otherwise favor utilitarianism.

The debate between majoritarianism and utilitarianism goes much deeper than this, but I won’t dive into it any further. If you’re interested in reading in-depth arguments for favoring utilitarianism over majoritarianism, you can start with this paper or this web page (you may need to reload the page a few times to get some of the larger gifs to load).

As an added bonus, a utilitarian winner will always exist, unlike a first-choice majority winner. Furthermore, multiple utilitarian winners can only ever exist if there is an exact tie for total support, which becomes incredibly rare as voters’ preferences are measured more precisely. This means that unlike with pairwise majority winners, we are nearly guaranteed to have exactly one utilitarian winner.

Alright, so that’s nice and all, but how do you tell whether a voting method is utilitarian or not? Well, you don’t. Or at least, you don’t divide voting methods into a binary set of categories like utilitarian and non-utilitarian. Instead, you measure how well voting methods perform on expected utility metrics. In other words, you figure out on average how much satisfaction each voter receives from the winners elected by each voting method over a bunch of elections. This can be accomplished through the use of computer simulations.

An election simulation starts by generating the candidates and electorate. Each voter in the electorate will have some set of preferences over the candidates. In most simulations, these preferences are generated by placing the candidates and voters in a multi-dimensional political space and assigning each voter more favorable preferences for the candidates located closest to them. These voters will then cast ballots, and the voting method being assessed is used to pick the winning candidate. Finally, the average satisfaction of the electorate is calculated and recorded.

Ok, why use simulations? By far the biggest reason for this is the lack of real-world data for almost all voting methods. Unfortunately, in order to acquire a sufficient amount of real-world data, you need to already have the voting method widely implemented. This means that the initial decision to implement a voting method can only be made based on simulated data. In fact, these decisions have historically been made without any data at all, which means that simply by running simulations we put ourselves in a position to make much better decisions than have been made in the past. The other reason to favor simulations is that unlike in real life, you can “read the minds” of the electorate, so you have perfect knowledge of their true preferences. In real elections, you have to resort to collecting polling data, which inevitably introduces some noise into the results, though it may not be much.

Now this is great and all, but there is a significant downside to relying on simulations: they may just completely fail to correspond to reality. Reality is complex, and voter behavior is especially complex. Does a simple spatial model in which voters have perfectly coherent preferences over all candidates and perfect knowledge of those preferences really capture the behavior of human voters? Well, probably not. But simulations can still be useful if their results are treated correctly. The simplest option is to treat a voting method’s performance in simulations as an upper bound on its real-world performance. After all, it’s pretty unlikely that the complications introduced by reality will just so happen to line up in the exact way needed to improve a voting method’s accuracy. In this way, simulations can be used to rule out voting methods with low accuracy, but they won’t confirm that voting methods with high simulated accuracy will perform well if implemented in our elections.

However, we can actually do a bit better than the upper bound approach here. See, many different people have ran many different simulations that were implemented in many different ways using many different assumptions. This gives us an opportunity to learn something about how well a voting method’s good performance will generalize across different models of elections. If a voting method is robust to many changes made across simulation models, it has a much greater chance of performing well under a model that actually does correspond to reality. Because of this, we’ll want to look at multiple sets of simulations.

The first set of voting method simulations I’ll cover is Warren D. Smith’s Bayesian Regret simulations. This is one of the most well-known sets of simulations out there, and it’s especially useful for our purposes because Smith varied many of the assumptions made in his models in order to ensure that the results weren’t too sensitive to any of them. There is, however, one small problem. These simulations are from 2000, but STAR voting wasn’t invented until 2014, so it isn’t included in the results. Luckily, there is a workaround. Warren Smith may not have been able to simulate STAR, but he was able to simulate score voting with a manual top-two runoff. While these two methods have different strategies associated with them, they behave essentially identically under honesty. Thus, while we can’t really make use of the results from simulations with strategy, we can learn something about how well STAR performs with honest voters.

So how well does STAR perform according to these simulations? The answer is very well. With 13 honest voters, score voting with a top-two runoff (labelled Range2Runoff on the page) has a Bayesian Regret value of 0.121. In contrast, approval voting had a value of 0.190, ranked choice voting (labelled IRV) had a value of 0.217, and plurality/FPTP had a value of 0.331. (Do note that since this is a measure of regret, lower numbers are better. A value of 0 means the best winner was picked every time, and if the worst winner was picked every time that would, in this case, correspond to a value just over 2.) It also beats many other sophisticated voting methods like ranked pairs and Schulze. Overall, the regret from this method is the 8th lowest out of the roughly 50 voting methods that were tested in these simulations.

Next, we’ll take a look at Jameson Quinn’s Voter Satisfaction Efficiency simulations.³ This is another well-known set of voting method simulations, and similar to what Warren Smith did, Quinn made sure to vary many of the assumptions that his model made. Unlike the previous set of simulations however, these ones make use of a more realistic hierarchical clusters model. They were also ran in 2017, which means they were able to include STAR voting explicitly. This means that this set of simulations gives results for both honest voters and strategic voters.

The results for this simulation are reported using Voter Satisfaction Efficiency (VSE), a metric which I’ve covered before. A VSE of 100% corresponds to always picking the best winner, and a VSE of 0% corresponds to picking a winner uniformly at random. Looking at the results for STAR voting with a 5-star ballot, under honest voting STAR has a VSE of 97.7%. When half the electorate votes strategically it only drops to 97.3%, and when the entire electorate votes strategically it drops to 94.1%. The worst-case scenario occurs when the dominant faction votes honestly while the underdog faction votes strategically, and even then VSE only falls to 90.9%.

In comparison, approval voting’s VSEs fall in the range 84.1%-95.5%, ranked choice voting’s in the range 79.7%-91.3%, and plurality/FPTP’s in the range 71.8-86.0%. The only scenario in which approval voting outperforms STAR voting is when all voters are strategic. There is no scenario in which ranked choice voting outperforms STAR voting, and plurality voting actually performs worse in every scenario than STAR’s worst-case performance. Once again, STAR voting proves itself to be more accurate than a variety of alternatives. The only voting method tested that was able to more-or-less match STAR’s performance was 3-2-1 voting, which like STAR voting is a rated runoff method. Compared to STAR’s VSE range of 90.9%-97.7%, 3-2-1’s range was 91.9%-95.1%.

The last set of simulations I’ll discuss are John Huang’s Voter Regret and VSE simulations. These simulations are from 2020 and 2021, making them more recent than most. They are divided into 3 sets, which all make use of various spatial models. Both the first set and second set only consider honest voters, while the third set introduces 2 kinds of strategy. Rather than going through each of these individually, I’ll give an overview of the summary report, which primarily relies on the results from the third set. That being said, I do recommend taking the time to check out all 3 sets of results.

As with Quinn’s simulations, the results are reported in terms of VSE. STAR voting has an average VSE of 85.8%, which is second only to Smith//Score’s average VSE of 87.0%.⁴ Approval voting’s average VSE is 77.2%, ranked choice voting’s is 72.5%, and plurality/FPTP’s is a measly 51.5%. Based on these results, the executive summary states that:

This report ultimately finds that several methods have excellent performance in their ability to choose candidates which satisfy a greater number of voters than other methods. These best methods include Condorcet-compliant methods, STAR voting, and Smith-Score. Among the highest performing methods, STAR voting is the simplest to implement and compute.

The results section reiterates this recommendation:

Based on these results, I recommend the replacement of plurality voting with any of the above tested voting methods, all of which are superior to plurality in the scenarios tested. However for optimal results, I recommend STAR voting or Condorcet methods.

All of these simulations indicate that STAR voting is a highly accurate method on par with sophisticated methods such as Condorcet methods and 3-2-1 voting. While none of them are able to capture reality in all its complexity, their results indicate that STAR voting’s strong performance should generalize to most situations, including most that occur in the real world. STAR voting also does an excellent job of balancing this utilitarian style of accuracy with a majoritarian runoff to ensure that the winner satisfies both camps. When you combine these properties with its simplicity, excellent ballot format, and highly informative pre-election polls, you end up with an amazing voting method that is absolutely worth implementing.

Ok, technically if there’s an even number of voters with a preference, then it’s possible for both candidates to be supported by exactly 50% of those voters. However, the possibility that this happens is usually just ignored since it becomes incredibly rare for sufficiently large electorates. ↩
It may have occurred to you that because of the automatic runoff, STAR would actually choose pepperoni in this scenario. While this is a downside, it ends up being a surprisingly small one. One reason for this is that if voters choose to vote strategically or even just normalize their ballots (giving the highest rating to their favorite and the lowest to their least favorite), methods like score voting that would otherwise pick mushrooms will end up picking pepperoni anyway. ↩
A new set of these simulations was recently published in the peer-reviewed journal Constitutional Political Economy. I won’t be covering them here, but I encourage interested readers to check them out. ↩
Smith//Score is a Condorcet method that sort of functions like STAR voting in reverse; it uses pairwise comparisons in its first round of tallying and scores in its second round. ↩

Technically Exists

Why I like STAR voting: winner selection