Archive

Democracy

The following applies to elections for Parliament, say for the US House of Representatives or the UK House of Commons, and it may also apply for the election of a city council. When the principle is one man, one vote then we would want that the shares of “seats won” would be equal to the shares of “votes received”. When there are differences then we would call this inequality or disproportionality.

Such imbalance is not uncommon. At the US election of November 8 2016, the Republicans got 49.1% of the votes and 55.4% of the seats, while the Democrats got 48% of the votes and 44.6% of the seats. At the UK general election of June 8 2017, the Conservatives got 42.2% of the votes and 48.8% of the seats while Labour got 39.9% of the votes and 40.3% of the seats (the wikipedia data of October 16 2017 are inaccurate).

This article clarifies a new and better way to measure this inequality or disproportionality of votes and seats. The new measure is called Sine-Diagonal Inequality / Disproportionality (SDID). The new measure falls under descriptive statistics. Potentially it might be used in any area where one matches shares or proportions, like the proportions of minerals in different samples. SDID is related to statistical concepts like R-squared and the regression slope. This article looks at some history, as Karl Pearson (1857-1936) created the R-Squared and Ronald A. Fisher (1890-1962) in 1915 determined its sample distribution. The new measure would also be relevant for Big Data. William Gosset (1876-1937) a.k.a. “Student” was famously unimpressed by Fisher’s notion of “statistical significance” and now is vindicated by descriptive statistics and Big Data.

The statistical triad

Statistics has the triad of Design, Description and Decision.

  • Design is especially relevant for the experimental sciences, in which plants, lab rats or psychology students are subjected to alternate treatments. Design is informative but less applicable for observational sciences, like macro-economics and national elections when the researcher cannot experiment with nations.
  • Descriptive statistics has measures for the center of location – like mean or median – and measures of dispersion – like range or standard deviation. Important are also the graphical methods like the histogram or the frequency polygon.
  • Statistical decision making involves the formulation of hypotheses and the use of loss functione to evaluate that hypotheses. A hypothesis on the distribution of the population provides an indication for choosing the sample size. A typical example is the definition of decision error (of the first kind) that a hypothesis is true but still rejected. One might accept a decision error in say 5% of the cases, called the level of statistical significance.

Historically, statisticians have been working on all these areas of design, description and decision, but the most difficult was the formulation of decision methods, since this involved both the calculus of reasoning and the more complex mathematics on normal, t, chi-square, and other frequency distributions. In practical work, the divide between the experimental and the non-experimental (observational) sciences appeared insurmountable. The experimental sciences have the advantages of design and decisions based upon samples, and the observational sciences basically rely on descriptive statistics. When the observational sciences do regressions, there is an ephemeral application of statistical significance that invokes the Law of Large Numbers, that all error approximates the normal distribution.

This traditional setup of statistics is being challenged in the last decades by Big Data – see also this discussion by Rand Wilcox in Significance May 2017. When all data are available, and when you actually have the population data, then the idea of using a sample evaporates, and you don’t need to develop hypotheses on the distributions anymore. In that case descriptive statistics becomes the most important aspect of statistics. For statistics as a whole, the emphasis shifts from statistical decision making to decisions on content. While descriptive statistics had been applied mostly to samples, Big Data now causes the additional step how these descriptions relate to decisions on content. In fact, such questions already existed for the observational sciences like for macro-economics and national elections, in which the researcher only had descriptive statistics, and lacked the opportunity to experiment and base decisions upon samples. The disadvantaged areas now provide insights for the earlier advantaged areas of research.

The key insight is to transform the loss function into a descriptive statistic itself. An example is the Richter scale for the magnitude of earthquakes. It is both a descriptive statistic and a factor in the loss function. A nation or regional community has on the one hand the cost of building and construction and on the other hand the risk of losing the entire investments and human lives. In the evaluation of cost and benefit, the descriptive statistic helps to clarify the content of the issue itself. The key issue is no longer a decision within statistical hypothesis testing, but the adequate description of the data so that we arrive at a better cost-benefit analysis.

Existing measures on votes versus seats

Let us return to the election for the House of Representatives (USA) or the House of Commons (UK). The criterion of One man, one vote translates into the criterion that the shares of seats equal the shares of votes. We are comparing two vectors here.

The reason why the shares of seats and votes do not match is because the USA and UK use a particular setup. The setup is called an “electoral system”, but since it does not satisfy the criterion of One man, one vote, it does not really deserve that name. The USA and UK use both (single member) districts and the criterion of Plurality per district, meaning that the district seat is given to the candidate with the most votes – also called “first past the post” (FPTP). This system made some sense in 1800 when the concern was district representation. However, when candidates stand for parties then the argument for district representation loses relevance. The current setup does not qualify for the word “election” though it curiously continues to be called so. It is true that voters mark ballots but that is not enough for a real election. When you pay for something in a shop then this is an essential part of the process, but you also expect to receive what you ordered. In the “electoral systems” in the USA and UK, this economic logic does not apply. Only votes for the winner elect someone but the other votes are obliterated. For such reasons Holland switched to equal / proportional representation in 1917.

For descriptive statistics, the question is how to measure the deviation of the shares of votes and seats. For statistical decision making we might want to test whether the US and UK election outcomes are statistically significantly different from inequality / proportionality. This approach requires not only a proper descriptive measure anyway, but also some assumptions on the distribution of votes which might be rather dubious to start with. For this reason the emphasis falls on descriptive statistics, and the use of a proper measure for inequality / disproportionality (ID).

A measure proposed by, and called after, Loosemore & Hanby in 1971 (LHID) uses the sum of the absolute deviations of the shares (in percentages), divided by 2 to correct for double counting. The LHID for the UK election of 2017 is 10.5 on a scale of 100, which means that 10.5% of the 650 seats (68 seats) in the UK House of Commons are relocated from what would be an equal allocation. When the UK government claims to have a “mandate from the people” then this is only because the UK “election system” is so rigged that many votes have been obliterated. The LHID gives the percentage of relocated seats but is insensitive to how these actually are relocated, say to a larger or smaller party.

The Euclid / Gallagher measure proposed in 1991 (EGID) uses the Euclidean distance, again corrected for double counting. For an election with only two parties EGID = LHID. The EGID has become something like the standard in political science. For the UK 2017 the EGID is 6.8 on a scale of 100, which cannot be interpreted as a percentage of seats like LHID, but which indicates that the 10.5% of relocated seats are not concentrated in the Conservative party only.

Alan Renwick in 2015 tends to see more value in LHID than EGID: “As the fragmentation of the UK party system has increased over recent years, therefore, the standard measure of disproportionality [thus EGID] has, it would appear, increasingly understated the true level of disproportionality.”

The new SDID measure

The new Sine-Diagonal Inequality / Disproportionality (SDID) measure – presented in this paper – looks at the angle between the vectors of the shares of votes and seats.

  • When the vectors overlap, the angle is zero, and then there is perfect equality / proportionality.
  • When the vectors are perpendicular then there is full inequality / disproportionality.
  • While this angle variates from 0 to 90 degrees, it is more useful to transform it into sine and cosine that are in the [0, 1] range.
  • The SDID takes the sine for inequality / disproportionality and the cosine of the angle for equality / proportionality.
  • With Sin[0] = 0 and Cos[0] = 1, we thus get a scale that is 0 for full inequaliy / disproportionality and 1 for full equality / proportionality.

It appears that the sine is more sensitive than either absolute value (LHID) and Euclidean distance (EGID). It is closer to the absolute value for small angles, and closer to the Euclidean distrance for larger angles. See said paper, Figure 1 on page 10. SDID is something like a compromise between LHID and EGID but also better than both.

The role of the diagonal

When we regress the shares of the seats on the shares of the votes without using a constant – i.e. using Regression Through the Origin (RTO) – then this gives a single regression coefficient. When there is equality / proportionality then this regression coefficient is 1. This has the easy interpretation that this is the diagonal in the votes & seats space. This explains the name of SDID: when the regression coefficient generates the diagonal, then the sine is zero, and there is no inequality / disproportionality.

Said paper – see page 38 – recovers a key relationship between on the one hand the sine and on the other hand the Euclidean distance and this regression coefficient. On the diagonal, the sine and Euclidean distance are both zero. Off-diagonal, the sine differs from the Euclidean distance in nonlinear manner by means of a factor given by the regression coefficient. This relationship determines the effect that we indicated above, how SDID compromises between and improves upon LHID and EGID.

Double interpretation as slope and similarity measure

There appears to be a relationship between said regression coefficient and the cosine itself. This allows for a double interpretation as both slope and similarity measure. This weblog text is intended to avoid formulas as much as possible and thus I refer to said paper for the details. Suffice to say here is that, at first, it may seem to be a drawback that such a double interpretation is possible, yet, on closer inspection the relationship makes sense and it is an advantage to be able to switch perspective.

Weber – Fechner sensitivity, factor 10, sign

In human psychology there appears to be a distinction between actual differences and perceived differences. This is called the Weber – Fechner law. When a frog is put into a pan with cool water and slowly boiled to death, it will not jump out. When a frog is put into a pan with hot water it will jump out immediately. People may notice differences between low vote shares and high seat shares, but they may be less sensitive to small differences, while these differences actually can still be quite relevant. For this reason, the SDID uses a sensitivity transform. It uses the square root of the sine.

(PM. A hypothesis is that the USA and UK call their national “balloting events” still “elections”, is that the old system of districts has changed so gradually into the method of obliterating votes that many people did not notice. It is more likely though that that some parties recognised the effect, but have an advantage under the present system, and then do not want to change to equal / proportional representation.)

Subsequently, the sine and its square root have values in the range [0, 1]. In itself this is an advantage, but it comes with leading zeros. We might multiply with 100 but this might cause the confusion as if it would be percentages. The second digit might give a false sense of accuracy. It is more useful to multiply this by 10. This gives values like on a report card. We can compare here to Bart Simpson, who appreciates low values on his report card.

Finally, when we compare, say, votes {49, 51} and seats {51, 49}, then we see a dramatic change of majority, even though there is only a slight inequality / disproportionality. It is useful to have an indicator for this too. It appears that this can be done by using a negative sign when such majority reversal occurs. This method of indicating majority reversals is not so sophisticated yet, and at this stage consists of using the sign of the covariance of the vectors of votes and seats.

In sum: the full formula

This present text avoids formulas but it is useful to give the formula for the new measure of SDID, so that the reader may link up more easily with the paper in which the new measure is actually developed. For the vectors of votes and seats we use the symbols v and s, and the angle between the two vectors give cosine and then sine:

SDID[v, s] = sign 10 √ Sin[v, s]

For the UK 2017, the SDID value is 3.7. For comparison the values of Holland with equal / proportional representation are: LHID 3, EGID 1.7, SDID 2.5. It appears that Holland is not yet as equal / proportional as can be. Holland uses the Jefferson / D’Hondt method, that favours larger parties in the allocation of remainder seats. At elections there are also the wasted vote, when people vote for fringe parties that do not succeed in getting seats. In a truly equal or proportional system, the wasted vote can be respected by leaving seats empty or by having a qualified majority rule.

Cosine and R-squared

Remarkably, Karl Pearson (1857-1936) also used the cosine when he created R-squared, also known as the “coefficient of determination“. Namely:

  • R-squared is the cosine-squared applied to centered data. Such centered data arise when one subtracts the mean value from the original data. For such data it is advisable to use a regression with a constant, which constant captures the mean effect.
  • Above we have been using the original (non-centered) data. Alternatively put, when we do above Regression Through the Origin (RTO) and then look for the proper coefficient of determination, then we get the cosine-squared.

The SDID measure thus provides a “missing link” in statistics between centered and non-centered data, and also provides a new perspective on R-squared itself.

Apparently till now statistics found little use for original (non-centered) data and RTO. A possible explanation is that statistics fairly soon neglected descriptive statistics as less challenging, and focused on statistical decision making. Textbooks prefer the inclusion of a constant in the regression, so that one can test whether it differs from zero with statistical significance. The constant is essentially used as an indicator for possible errors in modeling. The use of RTO or the imposition of a zero constant would block that kind of application. However, this (traditional, academic) focus on statistical decision making apparently caused the neglect of a relevant part of the analysis, that now comes to the surface.

R-squared has relatively little use

R-squared is often mentioned in statistical reports about regressions, but actually it is not much used for other purposes than reporting only. Cosma Shalizi (2015:19) states:

“At this point, you might be wondering just what R-squared is good for — what job it does that isn’t better done by other tools. The only honest answer I can give you is that I have never found a situation where it helped at all. If I could design the regression curriculum from scratch, I would never mention it. Unfortunately, it lives on as a historical relic, so you need to know what it is, and what misunderstandings about it people suffer from.”

At the U. of Virginia Library, Clay Ford summarizes Shalizi’s points on the uselessness of R-squared, with a reference to his lecture notes.

Since the cosine is symmetric, the R-squared is the same for regressing y given x, or x given y. Shalizi (2015, p18) infers from the symmetry: “This in itself should be enough to show that a high R² says nothing about explaining one variable by another.” This is too quick. When theory shows that x is a causal factor for y then it makes little sense to argue that y explains x conversely. Thus, for research the percentage of explained variation can be informative. Obviously it matters how one actually uses this information.

When it is reported that a regression has an R-squared of 70% then this means that 70% of the variation of the explained variable is explained by the model, i.e. by variation in the explanatory variables and the estimated coefficients. In itself such a report does not say much, for it is not clear whether 70% is a little or a lot for the particular explanation. For evaluation we obviously also look at the regression coefficients.

One can always increase R-squared by including other and even nonsensical variables. For a proper use of R-squared, we would use the adjusted R-squared. R-adj finds its use in model specification searches – see Dave Giles 2013. For an increase of R-adj coefficients must have an absolute t-value larger than 1. A proper report would show how R-adj increases by the inclusion of particular variables, e.g. also compared to studies by others on the same topic.  Comparison on other topics obviously would be rather meaningless. Shalizi also rejects R-adj and suggests to work directly with the mean squared error (MSE, also corrected for the degrees of freedom). Since R-squared is the cosine, then the MSE relates to the sine, and these are basically different sides of the same coin, so that this discussion is much a-do about little. For standardised variables (difference from mean, divided by standard deviation), the R-squared is also the coefficient of regression, and then it is relevant for the effect size.

R-squared is a sample statistic. Thus it depends upon the particular sample. A hypothesis is that the population has a ρ-squared. For this reason it is important to distinguish between a regression on fixed data and a regression in which the explanatory variables also have a (normal) distribution (errors in variables). In his 1915 article on the sample distribution of R-squared. R.A Fisher (digital library) assumed the latter. With fixed data, say X, the outcome is conditional on X, so that it is better to write ρ[X], lest one forgets about the situation. See my earlier paper on the sample distribution of R-adj. Dave Giles has a fine discussion about R-squared and adjusted R-squared. A search gives more pages. He confirms the “uselessnes” of R-squared: “My students are often horrified when I tell them, truthfully, that one of the last pieces of information that I look at when evaluating the results of an OLS regression, is the coefficient of determination (R2), or its “adjusted” counterpart. Fortunately, it doesn’t take long to change their perspective!” Such a statement should not be read as the uselessness of cosine or sine in general.

A part of history of statistics that is unknown to me

I am not familiar with the history of statistics, and it is unknown to me what else Pearson, Fisher, Gosset and other founding and early authors wrote about the application of the cosine or sine. The choice to apply the cosine to centered data to create R-squared is deliberate, and Pearson would have been aware that it might also be applied to original (non-centered) data. It is also likely that he would not have the full perspective above, because then it would have been in the statistical textbooks already. It would be interesting to know what the considerations at time were. Quite likely the theoretical focus was on statistical decision making rather than on description, yet this for me unknown history would put matters more into perspective.

Statistical significance

Part of the history is that R.A. Fisher with his attention for mathematics emphasized precision while W.S. Gosset with his attention to practical application emphasized the effect size of the coefficients found by regression. Somehow, statistical significance in terms of precision became more important than content significance, and empirical research has rather followed Fisher than the practical relevance of Gosset. This history and its meaning is discussed by Stephen Ziliak & Deirdre McCloskey 2007, see also this discussion by Andrew Gelman. As said, for standardised variables, the regression coefficient is the R-squared, and this is best understood with attention for the effect size. For some applications a low R-squared would still be relevant for the particular field.

Conclusion

The new measure SDID provides a better description of the inequality or disproportionality of votes and seats compared to existing measures. The new measure has been tailored to votes and seats, by means of greater sensitivity to small inequalities, and because a small change in inequality may have a crucial impact on the (political) majority. For different fields, one could taylor measures in similar manner.

That the cosine could be used as a measure of similarity has been well-known in the statistics literature since the start, when Pearson used the cosine for centered data to create R-square. For the use of the sine I have not found direct applications, but its use is straightforward when we look at the opposite of similarity.

The proposed measure provides an enlightening bridge between descriptive statistics and statistical decision making. This comes with a better understanding of what kind of information the cosine or R-squared provides, in relation to regressions with and without a constant. Statistics textbooks would do well by providing their students with this new topic for both theory and practical application.

Advertisements

Jacob Rees-Mogg had a talk for the Oxford Union, published on YouTube on 2013-11-11. The Oxford Union is a debating society. A debater’s aim is to win the audience over and not necessarily to discuss truth. Rees-Mogg had an entertaining talk but it is not targeted at discerning truth indeed. His presentation comes across as modest and forceful, with the charm of perhaps some old-fashioned style. Who closely considers his words may however be shocked by the unreasonableness and closed-mindedness.

Boris Johnson and Nigel Farage have been criticised for spreading false arguments for the June 23 2016 Brexit referendum. Obviously, these two individuals cannot be held accountable for swinging the views of some 45 million voters. I wondered since the referendum whether they had had some help. Apparently Jacob Rees-Mogg had been giving a helping hand.

To clarify Rees-Mogg’s departure from truth, we first must mention some properties of the European Parliament.

Seat-to-vote ratio’s in the EU parliament

The EU Parliament has 751 seats, distributed over 28 member states with 500 million people. The distribution over countries is not proportional to the populations, since countries are units by themselves, and it is felt that this should have some effect. Thus Germany with its population of 82 million has 96 seats (1.17 seats per million), the UK with its population of 65 million has 73 seats (1.12 seats per million), and Malta with its population of 0.5 million has 6 seats (12 seats per million). There is relatively little tension about this apportionment, since the countries fall in comparable classes (large, medium, very small), and the major political differences translate into political parties. The divisions between Christian Democrats, Social Democrats, Liberals, and what have you, apparently are dispersed over countries in similar manner, or, the political parties are able to create alliances over nations. It is part of the wonder of the EU that nationalism is being channeled and that there is more scope for civil democracy. A recent paper of mine on proportional representation is here.

Jacob Rees-Mogg’s quote on Malta

Jacob Rees-Mogg does not explain above democratic solution for dealing with Member States of different sizes. He criticises the EU that Malta is over-represented compared to the UK. It is a fact that Malta has a higher seat-to-vote ratio, but only pointing to this fact obscures the other considerations. He mentions a perhaps older figure of 15 instead of the current 11, but that is irrelevant here. The demagoguery is that many in his audience apparently are not be aware of the key notions in this apportionment, and he apparently takes advantage of their lack of knowledge to win them over to his own closed-mindedness. The demagoguery is that he creates a suggestion as if Malta has 15 times more influence than the UK, as if 6 is 15 times larger than 73 (as, indeed, 6 = 15 * 73).

The quote at the final minute starting at about 11.30 is, with the abusive “proportionally outvote” and the threat of “spectres”:

“So what is this great experiment doing ? It is helping once again the rise of the extreme right, and in some cases the extreme left. That is the threat to democracy that is there, that is coming, that is deeply destructive. But the fundamental problem, the real issue at hand tonight is that there is less democracy in this country, because of the European Union. Because, Ladies and Gentlemen, however you vote the next general election, 60% of our laws, and some say higher, is made on the basis of European agreements, where the Maltese proportionally outvote us 15 to 1. Whoever you vote for, matters less than somebody in Malta votes for, about the laws of our country. And if you are unsatisfied with that, and you want it changed, I cannot give you any redress, because the United Kingdom Parliament, the most ancient democratic Parliament in the world, has been made powerless. That is the threat to democracy. It is here, but it is on the continent as well. It is a frightening spectre. The best way to deal with it, is to deal with our relationship with the European Union, to put our own democracy first and foremost, and hope that others follow.”

Does it really require a protest ?

It is almost silly to protest to this demagoguery:

  • The situation w.r.t. the UK and Malta in the EU Parliament has been explained.
  • The UK has District Representation (DR) instead of Proportional Representation (PR), which causes that the UK is much less democratic than most countries in the EU or the EU Parliament itself. The PR Gini for the UK of 2017 is 15.6%, but there has been a lot of strategic voting, so that we don’t really know what the first preferences of UK voters are. By comparison, Holland has a PR Gini of only 3.6%, and people in Holland could vote for the party of their first choice. See this weblog text and this paper.
  • I tend to think that Rees-Mogg really worries about the state of democracy, while A.C. Grayling rather sees an elitist or even pecunary motive, see this article, as in “follow the money”. Yet Rees-Mogg doesn’t study the topic, and thus he is condemned to repeat an ideology. He studied history but not science. His voting track record apparently shows that he consistently voted against Proportional Representation. Old-fashioned hypocrisy apparently is also part of his old-fashioned style.

Malta enlarged some 30 x UK, dotted with 15 x UK. Spot the Real Malta

October 18: In memoriam Daphne Caruana Galizia (1964 – 2017), journalist, killed by a car bomb.

This weblog entry copies the earlier entry that used an estimate.
Now we use the actual YouGov data, below.
Again we can thank YouGov and Anthony Wells for making these data available.
The conclusions do not change, since the estimate apparently was fairly good.
It concerns a very relevant poll, and it is useful to have the uncertainty of the estimate removed.

The earlier discussion on Proportional Representation versus District Representation has resulted in these two papers:

Brexit stands out as a disaster of the UK First Past The Post (FPTP) system and the illusion that one can use referenda to repair disproportionalities caused by FPTP. This information about the real cause of Brexit is missing in the otherwise high quality overview at the BBC.

The former weblog text gave an overview of the YouGov polling data of June 12-13 2017 on the Great Britain (UK minus Northern Ireland) preference orderings on Brexit. The uncertainty of the estimate is removed now, and we are left with the uncertainty because of having polling data. The next step is to use these orderings for the various voting philosophies. I will be using the website of Rob LeGrand since this makes for easy communication. See his description of the voting philosophies. Robert Loring has a website that referred to LeGrand, and Loring is critical about FPTP too. However, I will use the general framework of my book “Voting theory for democracy” (VTFD), because there are some general principles that many people tend to overlook.

Input format

See the former entry for the problem and the excel sheet with the polling data of the preferences and their weights. LeGrand’s website requires us to present the data in a particular format. It seems best to transform the percentages into per-millions, since that website seems to require integers and we want some accuracy even though polling data come with uncertainty. There are no preferences with zero weights. Thus we get 24 nonzero weighted options. We enter those and then click on the various schemes. See the YouGov factsheet for the definition of the Brexit options, but for short we have R = Remain, S = Soft / Single Market, T = Tariffs / Hard, N = No Deal / WTO. Observe that the Remain options are missing, though these are important too.

248485:R>S>T>N
38182:R>S>N>T
24242:R>T>S>N
19394:R>T>N>S
12727:R>N>S>T
10909:R>N>T>S
50303:S>R>T>N
9091:S>R>N>T
22424:S>T>R>N
66667:S>T>N>R
9091:S>N>R>T
36364:S>N>T>R
6667:T>R>S>N
3636:T>R>N>S
12121:T>S>R>N
46667:T>S>N>R
15758:T>N>R>S
135152:T>N>S>R
9697:N>R>S>T
9091:N>R>T>S
8485:N>S>R>T
37576:N>S>T>R
16970:N>T>R>S
150303:N>T>S>R

Philosophy 1. Pareto optimality

The basic situation in voting has a Status Quo. The issue on the table is that we consider alternatives to the Status Quo. Only those options are relevant that are Pareto Improving, i.e. that some advance while none lose. Commonly there are more Pareto options, whence there is a deadlock that Pareto itself cannot resolve, and then majority voting might be used to break the deadlock. Many people tend to forget that majority voting is mainly a deadlock breaking rule. For it would not be acceptable when a majority would plunder a minority. The Pareto condition thus gives the minority veto rights against being plundered.

(When voting for a new Parliament then it is generally considered no option to leave the seats empty, whence there would be no status quo. A situation without a status quo tends to be rather exceptional.)

In this case the status quo is that the UK is a member of the EU. The voters for R block a change. The options S, T and N do not compensate the R. Thus the outcome remains R.

This is the fundamental result. The philosophies in the following neglect the status quo and thus should not really be considered.

PM 1. Potentially though, the S, T and N options must be read such that the R will be compensated for their loss.

PM 2. Potentially though, Leavers might reason that the status quo concerns national sovereignty, that the EU breaches upon. The BBC documentary “Europe: ‘Them’ or ‘Us’” remarkably explains that it was Margaret Thatcher who helped abolish the UK veto rights and who accepted EU majority rule, and who ran this through UK Parliament without proper discussion. There seems to be good reason to return to unanimity rule in the EU, yet it is not necessarily a proper method to neglect the rights of R. (And it was Thatcher who encouraged the neoliberal economic policies that many UK voters complain about as if these would come from the EU.)

Philosophy 2. Plurality

On LeGrand’s site we get Plurality as the first step in the Hare method. gets 35% while the other options are divided with each less than 35%. Thus the outcome is R.

(The Brexit referendum question in 2016 was flawed in design e.g. since it hid the underlying disagreements, and collected all dissent into a single Leave, also sandwiching R between various options for Leave.)

Philosophy 3. Hare, or Instant Run-off, a form of Single Transferable Vote (STV)

When we continue with Hare, then R remains strong and it collects votes when S and N drop off (as it is curiously sandwiched between options for Leave). Eventually R gets 45.0% and T gets 55.0%. Observe that this poll was on June 12-13 2017, and that some 25% of the voters “respect” the 2016 referendum outcome that however was flawed in design. I haven’t found information about preference orderings at the time of the referendum.

Philosophy 4. Borda

Borda generates the collective ranking S > T > R > N. This is Case 9 in the original list, and fortunately this is single-peaked.

Philosophy 5. Condorcet (Copeland)

Using Copeland, we find that S is also the Condorcet winner, i.e. wins from each other option in pairwise contests. This means that S is also the Borda Fixed Point winner.

Conclusions

The major point of this discussion is that the status quo consists of the UK membership of the EU. Part of the status quo is that the UK may leave by invoking article 50. However, the internal process that caused the invoking of article 50 leaves much to be desired. Potentially many voters got the suggestion as if they might vote about membership afresh without the need to compensate those who benefit from Remain.

Jonathan Portes suggested in 2016 that the Brexit referendum question was flawed in design because there might be a hidden Condorcet cycle. The YouGov poll didn’t contain questions that allows to check this, also because much has happened in 2016-2017, including the misplaced “respect” by 25% of the voters for the outcome of a flawed referendum. A key point is that options for Remain are not included, even though they would be relevant. My impression is that the break-up of the UK would be a serious issue, even though, curiously, many Scots apparently rather prefer the certainty of the closeness to a larger economy of the UK rather than the uncertainties of continued membership of the EU when the UK is Leaving.

It would make sense for the EU to encourage a reconsideration within the UK about what people really want. The Large Hadron Collider is expensive, but comparatively it might be less expensive when the UK switches to PR, splits up its confused parties (see this discussion by Anthony Wells), and has a new vote for the House of Commons. The UK already has experience with PR namely for the EU Parliament, and it should not be too complex to use this approach also for the nation.

Such a change might make it also more acceptable for other EU member states if the UK would Breget. Nigel Farage much benefited from Proportional Representation (PR) in the EU Parliament, and it would be welcome if he would lobby for PR in the UK too.

Nevertheless, given the observable tendency in the UK to prefer a soft Brexit, the EU would likely be advised to agree with such an outcome, or face a future with a UK that rightly or wrongly feels quite maltreated. As confused as the British have been on Brexit, they might also be sensitive to a “stab-in-the-back myth”.

In a July weblog entry, I reported on a rather important YouGov poll. YouGov.com and Anthony Wells were so kind to provide the underlying poll data. Earlier I estimated some rankings, but thanks to this kindness we now have certainty about the poll data, so that only the uncertainty remains due to polling itself. It also appeared that what I had categorized as a hard (H) Brexit better be rephrased as the No Deal (N) case. I will maintain the label on the Tariff (T) option, that some would call hard.

The UK general election was on June 8 and the poll was taken on June 12-13 so that the persons polled will have had vivid recollections. For this reason, these polling data can be considered quite important.

The poll generated data about confusions in the British electorate. It is useful to belabour the point, for Brexit is a key event and would have quite some impact for the coming decades. I would respect the UK decision to leave the EU but have my doubts when it is not based upon Proportional Representation (PR). A referendum gives proportions but referenda tend to be silly and dangerous, as they are an instrument of populism rather than of representative democracy. Indeed, it appears that the Brexit referendum question was flawed in design. The YouGov poll helps us to observe how confused a major section of the UK electorate is. Let us dig a bit deeper.

The following copies my weblog text of July 11, but now replacing the estimate by the real data.

Representation of preferences via a ranking matrix

Let voters consider the options R = Remain, S = European Economic Area (EEA) a.k.a. Single Market a.k.a Soft, T = Tariffs a.k.a. Hard, N = No Deal, World Trade Organisation (WTO). A consistent Remainer would tend to have the ranking R > S > T > N, and a consistent Leaver would tend to have this in reverse.

The YouGov poll presents the data in a ranking matrix, with the first preferences in the first row, then the second preferences, and so on. For the Brexit referendum outcome of 48% Remain and 52% Leave, for example, we might have the following setup. It is a guess, since the particular ways of Leaving were not included in the referendum question (and neither for Remaining). This example however is the result that you would expect if Remainers and Leavers would have the mentioned consistent orderings.

Observe that each voting weight (take e.g. 48) for a preference order list is put in precisely one place per row and per column, i.e. that it doesn’t occur more times in a single row or column. This explains why the border sums add up to 100.

The YouGov poll of June 12-13 2017

The YouGov data, that I have been referring to, contain the results of a poll of 1651 adults in Great Britain, i.e. the UK excluding Northern Ireland. From page 13-16 we can collect these data for the whole of Great Britain for 2017. YouGov states that the sample has been weighted for social-economic and political indicators. It is not clear to me how the “Don’t know”s are being handled for this particular issue. See also this discussion by Anthony Wells.

We can observe:

  • These are percentages, and both the row sums and the column sums should be 100, except for rounding errors.
  • 35% has Remain in the first position, 47% has it in the last position, so that 9 + 8 ≈ 17% (a 1% missing due to rounding) has a confused position, in which Remain is sandwiched between some options for Leaving. We would wonder how such people would vote in a referendum when they are presented with only two options R or L. One cannot say that the referendum was only about the first positions in the rankings, for voters would tend to develop an expectation about what would be the likely kind of Brexit and vote accordingly. Some of these 17% might have voted Remain because they disliked the otherwise expected version for Leave. This might indicate that the outcome for Remain was overstated. Yet we have no information on subdivisions of Remain, that might cause an opposite effect. Some might be okay with Remain as it is but vote for Leave because they fear that the UK otherwise might also join up on the Eurozone or some United States of Europe. The reason why the Brexit referendum question was flawed in design is that it left too much to guess here.
  • Remarkably, the split between R and L now in June 2017 would be 35% versus 65% instead of 48% versus 52% in 2016. In one single year Great Britain switched from fairly divided to a seemingly clear preference for Brexit (though divided upon how) ? I very much doubt this distribution, see this discussion on populism and DR. The electoral data still suggest more than 50% for Remain. In the July weblog entry it is discussed that some 26% of the electorate say that they voted for Remain but accept the loss at the referendum, so that they “play along” with the winning side, focusing on what would be the best option for Leave. This seems loyal to some notion of democracy, but it would also be a misplaced loyalty to the flawed Brexit referendum question. (One can respect such loyalty, but it still makes sense to discuss it.)

Using techniques of apportionment we estimated the number of people per cell in the poll. However, we now have the actual data (rounded to one digit from multi-digit percentages times 1651):

Possible permutations of rankings

With 4 options there are 4 possibilities for a first place, 3 remaining for the second place, 2 remaining for the third place, and then the final one follows. Thus there are 4 x 3 x 2 x 1 = 24 permutations for possible rankings. We already saw two of these: R > S > T > N and its reverse. Above ranking matrix is actually based upon these 24 possibilities.

Some of these 24 possibilities will be rather curious. It is not clear what to think about > N > S > for example (Case 5 below). This would be a Remainer who would rather prefer No Deal to the EEA or some agreement not to have a trade war on tariffs. A tentative explanation is that this voter has a somewhat binary position, as Remain versus No Deal At All, while the other options are neglected.

Policy options can also be sorted in logical order. This gives rise to the theory of Single Peakedness. For the topics of R, S, T and N there is a logical scale from left to right. An example of single-peakedness is Case 7 below, with a ranking S > R > T > N. See the graph below. The 1st rank gets utility level 4, the 2nd rank gets utility level 3, the 3rd rank gets utility level 2, and the 4th rank gets utility level 1. The utility levels are just the reversed of the ranks, but then the case must be reordered to the logical order.

Voting theory has a core that assumes that voters are both autonomous and rational, so that any preference would have some logic. The logical order R, S, T and N might seem arbitrary to some voters who may think otherwise. We do not impose that order but invite voters who think otherwise to explain why they choose a different order. Potentially each voter has his or her own criteria so that the best is on top, and all other options follow in proper order. Voters with multiple peaks in their preferences would have more to explain to us to understand them than voters with a single peak. Without a good explanation, we cannot reject the possibility that there is some confusion.

Presentation of preferences via preference orderings

The following are the YouGov data for the preferences orderings that underlie above YouGov results on percentages. See the excel sheet in the Appendix. This table shows only the percentages and not the numbers of people in the poll (that add up to above table), since the percentages are the main finding. Single dots are zero’s. The ConR / L and LabR / L subdivisions concern the voters in the poll who voted R or L in the 2016 Brexit referendum and who voted Con or Lab in 2017. They form only a part of the sample, so their sum doesn’t add up to the total on the left.

Discussion on GB

Some observations are:

  • The YouGov summary ranking matrix already showed a rather even split on S, T and N, but the data give a landscape with even more diversity in opinions.
  • Only 24.8% has the preference R > S > T > N and only 15.0% its reverse, so that 60.1% has some mixture.

Above results for GB can be split up in on the peaks and sandwich. The combinations give the following percentages:

  • The mentioned 60.1% split up again in 33.3% who are single peaked, and 26.8% who have multiple peaks.
  • The sandwich of 17.3% splits up into 8.5% with a single peak and 8.8% with multiple peaks.
  • Of the 26.8% with multiple peaks there are 10.5% who can join the Remainers with a first preference and there are 7.4% who can join the leavers with Remain in the last position (but various ways how to Leave).

The 8.8% would be a relevant section of the vote. They all voted Leave, but divided on S, T and N. Potentially the outcome of the 2016 Brexit referendum has been decided by the 8.8% GB voters who have Remain neither in the first or last position, and who do not follow the standard logical order on the options.

Discussion on ConR / L and LabR / L

The division of ConR / L and LabR / L is losing its relevance because it are dwingling groups, they are changing loyalties, and their 2016 votes are becoming history while there are new issues. Yet, the 2016 referendum question was flawed, and it is relevant to see how sizeable parts of the UK electorate deal with the logical conundrum that they took part in.

  • The 17.3% of the votes with Remain sandwiched can be found in the subdivisions in similar proportions.
  • 28.6% of ConR voters and 55.2% of LabR voters are united on the preference R > S > T > N. Presumably this was also the case in 2016, or there must be factors that increased or reduced consistency or confusion.
  • 30.8% of ConL and 22.4% of LabL are united on the preference N > T > S > R. Presumably this was also the case in 2016, or there must be factors that increased or reduced consistency or confusion.
  • One might expect that ConR / L and LabR / L voters of 2016 would have the benefit of a party preference and thus show more consistency, yet the distribution of views is quite as much, and the sandwich with multiple peaks is quite present.
  • The 2016 Conservative Remainers are loyal for 45.2% to the old point of view, but still vote for a Conservative party that is set on Leave. Part will be the misplaced loyalty for the flawed referendum. Alternatively, they voted for a minority in this party that still tries to bring balance ? (A good poll requires a focus group.) (And there is more in the world than just Brexit.)
  • The 2016 Labour Remainers are 76.1% loyal to the old point of view. Yet Labour leader Corbyn also prefers a Brexit. It might be the pecularities of the British system of District Representation (DR) that caused these voters not to switch to LibDem. (But the LibDem also have a liberal policy that many voters for Labour dislike. The system of DR doesn’t favour the entry of new political competitors.)
  • The 2016 Leavers have a high loyalty to the old view, ConL 88.2% and LabL 73.3%. Yet this doesn’t diminish the diversity of opinion about how to Leave.
Conclusions
  • The ranking matrix is a fine way to summarize results, yet the preference ordering are more accurate on the underlying and relevant orders. The ranking matrix is merely a matter of presentation by the statistical reporter. A person in a poll who can answer on a ranking matrix in fact gives the personal preference ordering. The statistician can compound these data while not losing information on the permutations. From the permutations it is always possible to create a ranking matrix, yet the reverse requires estimation techniques which generate needless uncertainty.
  • Asking for voter preference orderings in a poll is a useful exercise. It is not intended to propose this for general elections. For general elections it suffices that voters exercise a single vote for a party of choice. The condition however is Proportional Representation, otherwise there are serious distortions, see the earlier discussion on this weblog.
  • The information on the rankings and implied preference orderings suggest a rather large state of confusion in the electorate of Great Britain. The notion of single-peakedness appears to be quite useful in highlighting the issue of the preference order. Perhaps we cannot quite call this “confusion” since voters might have their own logic to order the four options. Until there is more clarity on what strikes one as illogical, the term “confusion” seems apt though.
  • It must be greatly appreciated that YouGov and Anthony Wells made these data available, since they provide a key insight in the state of opinion in Great Britain close to the general election of June 8 2017.
Appendix September 18 2017

The excel workbook with the full YouGov data and the earlier estimate is: 2017-09-18-YouGov-Rankings-full-data

The earlier discussion on Proportional Representation versus District Representation has resulted in this paper: Two conditions for the application of Lorenz curve and Gini coefficient to voting and allocated seats, MPRA 80297.

Brexit stands out as a disaster of the UK First Past The Post (FPTP) system and the illusion that one can use referenda to repair disproportionalities caused by FPTP. This information is missing in the otherwise high quality overview at the BBC.

In the earlier Puzzle on the YouGov poll I estimated Brexit preference orderings from a summary statistic published by YouGov. The next step is to use these orderings for the various voting philosophies. I will be using the website of Rob LeGrand since this makes for easy communication. See his description of the voting philosophies. Robert Loring has a website that referred to LeGrand, and Loring is critical about FPTP too. However, I will use the general framework of my book “Voting theory for democracy” (VTFD), because there are some general principles that many people tend to overlook.

Input format

See the Puzzle weblog text for the problem and the excel sheet with the estimate of the preferences and their weights. LeGrand’s website now requires us to present the data in a particular format. It seems best to transform the percentages into per-millions, since that website seems to require integers and we want some accuracy even though the estimate is tentative. We can also drop the preference rankings with zero weights. Thus we get 14 nonzero weighted options. We enter those and then click on the various schemes. See the YouGov factsheet for the definition of the Brexit options, but for short we have R = Remain, S = Soft / Single Market, T = Tariffs, H = Hard / WTO. Observe that the Remain options are missing, though these are important too.

261841:R>S>T>H
53499:R>S>H>T
38386:R>T>H>S
60161:S>R>T>H
30087:S>R>H>T
44443:S>T>R>H
34960:S>T>H>R
22354:S>H>T>R
24777:T>S>H>R
15640:T>H>R>S
181873:T>H>S>R
49951:H>S>T>R
20475:H>T>R>S
161553:H>T>S>R

Philosophy 1. Pareto optimality

The basic situation in voting has a Status Quo. The issue on the table is that we consider alternatives to the Status Quo. Only those options are relevant that are Pareto Improving, i.e. that some advance while none lose. Commonly there are more Pareto options, whence there is a deadlock that Pareto itself cannot resolve, and then majority voting might be used to break the deadlock. Many people tend to forget that majority voting is mainly a deadlock breaking rule. For it would not be acceptable when a majority would plunder a minority. The Pareto condition thus gives the minority veto rights against being plundered. (When voting for a new Parliament then it is generally considered no option to leave the seats empty, whence there would be no status quo. A situation without a status quo tends to be rather exceptional.)

In this case the status quo is that the UK is a member of the EU. The voters for R block a change. The options S, T and H do not compensate the R. Thus the outcome remains R.

This is the fundamental result. The philosophies in the following neglect the status quo and thus should not really be considered.

PM 1. Potentially though, the S, T and H options must be read such that the R will be compensated for their loss.

PM 2. Potentially though, Leavers might reason that the status quo concerns national sovereignty, that the EU breaches upon. The BBC documentary “Europe: ‘Them’ or ‘Us’” remarkably explains that it was Margaret Thatcher who helped abolish the UK veto rights and who accepted EU majority rule, and who ran this through UK Parliament without proper discussion. There seems to be good reason to return to unanimity rule in the EU, yet it is not necessarily a proper method to neglect the rights of R. (And it was Thatcher who encouraged the neoliberal economic policies that many UK voters complain about as if these would come from the EU.)

Philosophy 2. Plurality

On LeGrand’s site we get Plurality as the first step in the Hare method. gets 35% while the other options are divided with each less than 35%. Thus the outcome is R.

(The Brexit referendum question in 2016 was flawed in design e.g. since it hid the underlying disagreements, and collected all dissent into a single Leave, also sandwiching R between various options for Leave.)

Philosophy 3. Hare, or Instant Run-off, a form of Single Transferable Vote (STV)

When we continue with Hare, then R remains strong and it collects votes when S and H drop off (as it is curiously sandwiched between options for Leave). Eventually R gets 44.4% and T gets 55.6%. Observe that this poll was on June 12-13 2017, and that some 25% of the voters “respect” the 2016 referendum outcome that however was flawed in design. I haven’t found information about preference orderings at the time of the referendum.

Philosophy 4. Borda

Borda generates the collective ranking S > T > R > H. This is Case 9 in the original list (including zero weights), and fortunately this is single-peaked.

Philosophy 5. Condorcet (Copeland)

Using Copeland, we find that S is also the Condorcet winner, i.e. wins from each other option in pairwise contests. This means that S is also the Borda Fixed Point winner.

Conclusions

The major point of this discussion is that the status quo consists of the UK membership of the EU. Part of the status quo is that the UK may leave by invoking article 50. However, the internal process that caused the invoking of article 50 leaves much to be desired. Potentially many voters got the suggestion as if they might vote about membership afresh without the need to compensate those who benefit from Remain.

Jonathan Portes suggested in 2016 that the Brexit referendum question was flawed in design because there might be a hidden Condorcet cycle. The YouGov poll didn’t contain questions that allowed to check this, also because much has happened in 2016-2017, including the misplaced “respect” for the outcome of a flawed referendum. A key point is that options for Remain are not included, even though they would be relevant. My impression is that the break-up of the UK would be a serious issue, even though, curiously, many Scots apparently rather prefer the certainty of the closeness to a larger economy of the UK rather than the uncertainties of continued membership of the EU when the UK is Leaving.

It would make sense for the EU to encourage a reconsideration within the UK about what people really want. The Large Hadron Collider is expensive, but comparatively it might be less expensive when the UK switches to PR, splits up its confused parties, and has a new vote for the House of Commons. The UK already has experience with PR namely for the EU Parliament, and it should not be too complex to use this approach also for the nation. Such a change might make it also more acceptable for other EU member states if the UK would Breget. Nigel Farage much benefited from Proportional Representation (PR) in the EU Parliament, and it would be welcome if he would lobby for PR in the UK too.

Nevertheless, given the observable tendency in the UK to prefer a soft Brexit, the EU would likely be advised to agree with such an outcome, or face a future with a UK that rightly or wrongly feels quite maltreated. As confused as the British have been on Brexit, they might also be sensitive to a “stab-in-the-back myth”.

The French general elections for the Legislative were held on June 11 and 18 2017. The results provided by the French government are presented more accessible in wikipedia (a portal and no source), and have been used in this 2017-France-Lorenz-Gini excel sheet to determine the Lorenz curve and Gini coefficient.

The earlier discussion on Lorenz curve and Gini was about the Dutch and UK general elections.

Both UK and France have district representation (DR) with a First Past the Post rule. In the UK this causes strategic voting, in which a voter may not vote for the candidate of first choice, but tries to block a candidate who might win but would be worst. France has elections in two rounds so that there is less need for such a strategy. The second round is between the two top candidates in the district, and thus one might try to get at least one good candidate in that position.

Proportional representation (PR) may allow a larger (but fairer) share of the seats for the more extreme parties, like the party of Geert Wilders in Holland, yet PR also allows more stability for the center. Thus PR tends to avoid the swings between extremes that might happen in systems of district representation (DR).

Two rounds mean two sets of data

The French system seems to make it more difficult to determine the Lorenz curve and Gini coefficient. There are two rounds, and thus there is the question what data to take. However, the following choice suggests itself:

  • The data of the first round provide the first preferences, and thus provide the votes.
  • The data of both rounds provide the seats.

This choice finds support in the data. The first round has a turnout of 48.7% and 0.5 million invalid or blank votes. In the second round, more people remain at home, with a turnout of 42.6%, while those who vote produce almost 2 million invalid or blank votes, who apparently disprove of the available candidates or the system itself. Thus the higher turnout and lower blanks in the first round suggest that these indeed present the first preferences (with some limited level of strategy).

The Lorenz curve and Gini

The Lorenz curve shows a rather surprising level of inequality, with a Gini of 41.6%. Compare the value of Holland with a Gini of 3.6%. If the blue line would cover the pink diagonal then there would be full proportionality.

Data on turnout

The following table gives the data on turnout for the first round. The votes for “Elected in the House” is for parties that eventually got elected in the Legislative. The votes for “Not in the House” is for a radical leftist party that got votes in the first round but got no seat in none of the rounds.

The wasted vote consists of the invalid and blank votes and the latter “Not in the House”, to a total of almost 3%. A standard majority would be 289 seats of a House of 577 seats. If one would keep account of the wasted vote, then one might leave seats empty, or use a qualified majority of 298 seats, thus 9 more than usual.

When we divide the electorate by the number of votes per seat, then the Legislative would require 1222 rather than 577 seats. A majority would require 611 seats, which is more than the actual number of seats used. If one would want to keep account of the voters who did not turn out, then 51.3% or 296 of the 577 seats would be empty, or one would use the 611 seats as a qualified majority.

An example of the inequality

The new French President Emmanuel Macron had the highest score of 24% of the vote in the first round of the Presidential elections of 2017, with runner-up Marine Le Pen with 21.3%. Macron then won the second round with 66.1% (20.7 million) against Marine Le Pen with 33.9% (10.6 million) of the vote.

For the Legislative, Macron’s party REM got 27.6% while the Front National (FN) got 12.9% in the first round. For the Legislative Le Pen managed to get only 3 million votes, compared to the potential of 10.6 million at the presidential elections. With both rounds REM got 308 seats and FN got 8 seats.

These ratios would turn, if Le Pen would manage to motivate the voters of the presidential race to also support her for the Legislative. If the other parties would have a divided vote then Le Pen would benefit from First Past The Post.

Conclusions

For the UK in 2017 we calculated a Gini of 15.6% but this was a very tentative number since we had no estimate about the amount of strategic voting involved. For France we have an indication of the first preferences, namely from the first round.

France appears to have a surprisingly high Gini of 41.6%, which can be compared to the system of proportional representation (PR) in Holland that generates 3.6%.

This political inequality doesn’t bode well for the feelings amongst the French electorate about whether they are represented. The low turnout seems to reflect dissatisfaction rather than satisfaction. Such dissatisfaction might also translate into a protest vote over 4 years, especially when Macron doesn’t deliver.

Many observers in Europe seem to be happy with the election of Macron and his party REM, but the outcome is quite disproportional. If this disproportionality can happen for one party then it might also happen for another party – that one doesn’t like as much.

In the former weblog entry, I reported on a rather important YouGov poll. The UK general election was on June 8 and the poll was taken on June 12-13 so that we may assume that persons polled had still vivid recollections. The poll generated data about confusions in the British electorate. It is useful to belabour the point, for Brexit is a key event and would have quite some impact for the coming decennia. I would respect the UK decision to leave the EU but have my doubts when it is not based upon Proportional Representation (PR). A referendum gives proportions but referenda tend to be silly and dangerous, as they are an instrument of populism rather than of representative democracy. Indeed, it appears that the Brexit referendum question was flawed in design. The YouGov poll helps us to observe how confused a major section of the UK electorate is. Let us dig a bit deeper.

Representation of preferences via a ranking matrix

Let voters consider the options R = Remain, S = European Economic Area (EEA) a.k.a. Single Market a.k.a Soft, T = Tariffs, H = World Trade Organisation (WTO) a.k.a. Hard. A consistent Remainer would tend to have the ranking R > S > T > H, and a consistent Leaver would tend to have this in reverse.

The YouGov poll presents the data in a ranking matrix, with the first preferences in the first row, then the second preferences, and so on. For the Brexit referendum outcome of 48% Remain and 52% Leave, for example, we might have the following setup. It is a guess, since the particular ways of Leaving were not included in the referendum question. This example however is the result that you would expect if Remainers and Leavers would have the mentioned consistent orderings.

Observe that each voting weight (take e.g. 48) for a preference order list is put in precisely one place per row and per column, i.e. that it doesn’t occur more times in a single row or column. This explains why the border sums add up to 100.

The YouGov poll of June 12-13 2017

The YouGov data, that I have been referring to, contain the results of a poll of 1651 adults in Great Britain, i.e. the UK excluding Northern Ireland. From page 13-16 we can collect these data for the whole of Great Britain for 2017. YouGov states that the sample has been weighted for social-economic and political indicators. It is not clear to me how the “Don’t know”s are being handled for this particular issue. See also this discussion by Anthony Wells.

We can observe:

  • These are percentages, and both the row sums and the column sums should be 100, except for rounding errors.
  • 35% has Remain in the first position, 47% has it in the last position, so that 9 + 8 ≈ 17% (a 1% missing due to rounding) has a confused position, in which Remain is sandwiched between some options for Leaving. We would wonder how such people would vote in a referendum when they are presented with only two options R or L. One cannot say that the referendum was only about the first positions in the rankings, for voters would tend to develop an expectation about what would be the likely kind of Brexit and vote accordingly. Some of these 17% might have voted Remain because they disliked the otherwise expected version for Leave. This might indicate that the outcome for Remain was overstated. Yet we have no information on subdivisions of Remain, that might cause an opposite effect. Some might be okay with Remain as it is but vote for Leave because they fear that the UK otherwise might also join up on the Eurozone or some United States of Europe. The reason why the Brexit referendum question was flawed in design is that it left too much to guess here.
  • Remarkably, the split between R and L now in June 2017 would be 35% versus 65% instead of 48% versus 52% in 2016. In one single year Great Britain switched from fairly divided to a seemingly clear preference for Brexit (though divided upon how) ? I very much doubt this distribution, see the pre-former weblog discussion. The electoral data still suggest more than 50% for Remain. In the former weblog entry it is discussed that some 26% of the electorate say that they voted for Remain but accept the loss at the referendum, so that they “play along” with the winning side, focusing on what would be the best option for Leave. This seems loyal to some notion of democracy, but it would also be a misplaced loyalty to the flawed Brexit referendum question. (One can respect such loyalty, but it still makes sense to discuss it.)

Using techniques of apportionment we can estimate the actual number of people per cell in the poll. My estimate is (and YouGov would have the true numbers):

Possible permutations of rankings

With 4 options there are 4 possibilities for a first place, 3 remaining for the second place, 2 remaining for the third place, and then the final one follows. Thus there are 4 x 3 x 2 x 1 = 24 permutations for possible rankings. We already saw two of these: R > S > T > H and its reverse. Above ranking matrix is actually based upon these 24 possibilities.

Some of these 24 possibilities will be rather curious. It is not clear what to think about > H > S > for example (Case 5 below). This would be a Remainer who would rather prefer a Hard Brexit to the EEA or some agreement not to have a trade war on tariffs. A tentative explanation is that this voter has a somewhat binary position, as Remain versus Hard Brexit, while the other options are neglected.

Voting theory may assume voters that are both autonomous and rational, so that any preference would have some logic. This gives rise to the theory of Single Peakedness. Potentially each voter has his or her own criteria so that the best is on top, and all other follow in proper order. However, for the topics of R, S, T and H there is a logical scale from left to right. Voters with multiple peaks in their preferences have more to explain than voters with a single peak. An example of single-peakedness is Case 7 below, with a ranking S > R > T > H. See the graph below. The 1st rank gets utility level 4, the 2nd rank gets utility level 3, the 3rd rank gets utility level 2, and the 4th rank gets utility level 1. The utility levels are just the reversed of the ranks, but then the case must be reordered to the logical order.

Presentation of preferences via preference orderings

The following are estimates for the preferences orderings that would underlie above YouGov results. The estimate minimises the sum of squared error on that ranking matrix, with a weight of 10 for the error on the first preferences. See the excel sheet in the Appendix. This table shows only the percentages and not the numbers of people in the poll (that add up to above table), since the percentages are the main estimation result. Single dots are zero’s. Some have been caused by explicitly setting the possibility of such a preference ordering to zero, see the “comment” keyword for the reason. (A technical reason are also the degrees of freedom.) The ConR / L and LabR / L subdivisions concern the voters in the poll who voted R or L in the 2016 Brexit referendum and who voted Con or Lab in 2017. They form only a part of the sample, so their sum doesn’t add up to the total on the left. The percentages have a decimal to allow easier identification, not for claimed accuracy.

Discussion on GB

Some observations are:

  • The YouGov summary ranking matrix already showed a rather even split on S, T and H, but the estimate generates a landscape with even more diversity in opinions.
  • Only 26.2% has the preference R > S > T > H and only 16.2% its reverse, so that 57.7% (addition effect) have some mixture.

Above results for GB can be split up in on the peaks and sandwich. The combinations give the following percentages:

  • The mentioned 57.7% split up again in 34.6% who are single peaked, and 23.1% who have multiple peaks.
  • The sandwich of 17.1% splits up into 10.4% with a single peak and 6.7% with multiple peaks.
  • Of the 23.1% with multiple peaks there are 9.1% who can join the Remainers with a first preference and there are 7.3% who can join the leavers with Remain in the last position (but unclear how to Leave).

The 6.7% would be a relevant section of the vote. They all voted Leave, but divided on S, T and H. Potentially the outcome of the 2016 Brexit referendum has been decided by the 6.7% GB voters who have Remain neither in the first or last position, and who do not follow the standard logical order on the options.

Discussion on ConR / L and LabR / L

The division of ConR / L and LabR / L is losing its relevance because it are dwingling groups, they are changing loyalties, and their 2016 votes are becoming history while there are new issues. Yet, the 2016 referendum question was flawed, and it is relevant to see how sizeable parts of the UK electorate deal with the logical conundrum that they took part in.

  • The 17% of confused votes on the first preference can be found in the subdivisions in similar proportions.
  • 33.5% of ConR voters and 61.1% of LabR voters are united on the preference R > S > T > H. Presumably this was also the case in 2016, or there must be factors that increased or reduced consistency or confusion.
  • 28.9% of ConL and 19.4% of LabL are united on the preference H > T > S > R. Presumably this was also the case in 2016, or there must be factors that increased or reduced consistency or confusion.
  • One might expect that ConR / L and LabR / L voters of 2016 would have the benefit of a party preference and thus show more consistency, yet the distribution of views is quite as much, and the sandwich with multiple peaks is quite present.
  • The 2016 Conservative Remainers are loyal for 45.5% to the old point of view, but still vote for a Conservative party that is set on Leave. Part will be the misplaced loyalty for the flawed referendum. Alternatively, they voted for a minority in this party that still tries to bring balance ? (A good poll requires a focus group.) (And there is more in the world than just Brexit.)
  • The 2016 Labour Remainers are 76.1% loyal to the old point of view. Yet Labour leader Corbyn also prefers a Brexit. It might be the pecularities of the British system of District Representation (DR) that caused these voters not to switch to LibDem. (But the LibDem also have a liberal policy that many voters for Labour dislike. The system of DR doesn’t favour the entry of new political competitors.)
  • The 2016 Leavers have a high loyalty to the old view, ConL 86.8% and LabL 74.4%. Yet this doesn’t diminish the diversity of opinion about how to Leave (though T gets more votes than H).
Comment on uncertainty in this estimate

For n = 4, there are n! = 24 variables, (n-1)^2 = 9 independent equations within the matrix, and there is the addition constraint 1, so that the degrees of freedom are 14. Yet we cannot randomly set weights to zero. If there would be nonzero weights for single-peaked preferences only, then the YouGov ranking matrix would show zeros, which it doesn’t. Thus it takes some arbitration which weights to exclude. There are quite a lot of possibilities, and I can only hope that my choice was wisest. As said, the percentages provided by YouGov have been scaled up to the table given above, and this allows us to determine the error in the estimate. Due to degrees of freedom the calculated error is quite low. The use of an error measure is limited to comparing estimates and not something that is useful to mention here. As said, YouGov have the proper data, and it must be hoped that they will look into this.

Conclusions
  • The ranking matrix is a fine way to summarize results, yet the preference ordering are more accurate on the underlying and relevant orders. This is merely a matter of presentation by the statistical reporter. A person in a poll who can answer on a ranking matrix in fact gives the personal preference ordering. The statistician can compound these data while not losing information on the permutations. From the permutations it is always possible to create a ranking matrix, yet the reverse requires estimation techniques which generate needless uncertainty.
  • Asking for voter preference orderings in a poll is a useful exercise. It is not intended to propose this for general elections. For general elections it suffices that voters exercise a single vote for a party of choice. The condition however is Proportional Representation, otherwise there are serious distortions, see the earlier discussion on this weblog.
  • The information on the rankings and implied preference orderings suggest a rather large state of confusion in the electorate of Great Britain. The notion of single-peakedness appears to be quite useful in highlighting the issue of the preference order.
Appendix July 15 2017

I slightly revised the manner of rounding and included case 16 for all columns. The polished up excel workbook is: 2017-07-15-YouGov-Rankings