Archive

Role of mathematics

Given Kenneth Arrow’s impossibility theorem, it is a fair question to ask what voting system he himself would advise. There is a 2012 interview with him, with a phone recording and transcript, by Aaron Hamlin of the Center for Election Science. Arrow’s advice is:

  • Not plurality and no US Electoral College, with its winner-take-all selection of the US President
  • Not approval voting, since this uses too little information
  • A system that uses more information:

“Dr. Arrow: Well, I’m a little inclined to think that score systems [range voting] where you categorize in maybe three or four classes probably (in spite of what I said about manipulation) is probably the best. (…) In France, [Michel] Balinski has done some studies of this kind which seem to give some support to these scoring methods.”

His statement about strategic voting – or manipulation:

“Dr. Arrow: There’s only one problem that bothers me about that. And that’s something my theorem really doesn’t cover. In my theorem I was assuming people vote sincerely. The trouble with methods where you have three or four classes, I think if people vote sincerely they may well be very satisfactory. The problem is the incentive to misrepresent your vote may be high. In other words, a classic view is that there’s a candidate I really like, but I know is hopeless. I may put him down at the bottom and vote for the next candidate simply because I feel there’s a chance. Now, if you have a very large electorate you might say no individual has much of an incentive to misrepresent. But I’m not sure. You probably need experience rather than theory.”

Observe that Arrow cautiously states “a little inclined to think (…) probably the best”. His advice to have more empirical research can be supported. The interview touches on some points that call for a closer discussion, also in the light of this earlier weblog text.

Definitions

In plurality, voters only can vote for their best candidate. In a district, often the one with the highest score wins, which is the “first past the post” (FPTP) system. If there are only two candidates, then the winner will also have more than 50%. If there are more candidates, the winner may have less than 50%. There may be ways to assure that a final vote only concerns two candidates. A Putin hack that eliminates a particular candidate will not quickly be accepted, yet, voting theorists still wonder what method would be reasonable. A current example is that Donald Trump got elected with 46% of the popular vote, while Hillary Clinton got 48%. With a turnout of 60% Trump has only 28% support in the electorate, while the House of Congress depends upon district results too. A prime minister who is elected by a coalition in a parliament that has proportional representation (PR) generally has more than 50% support in parliament, and by representation also in the electorate.

In approval voting, voters mention which candidates they approve. The candidate with the highest total approvement is selected.

  • In economics this links up with satisficing (Herbert Simon).
  • Strategic voters will tend not to approve of candidates that might harm their best candidate (even the second best), so that this system devolves into plurality. Steven Brams claims that such fears are overrated but are they ? Brams declines to look into non-satisficing alternatives like the Borda Fixed Point method.

In Borda ranking, each voter puts the candidates in order of preference, and assigns rank numbers.

  • In economics this reflects the notion of ordinal utility.
  • Strategic voters will give a low score to candidates that harm their best candidate (even the second best), which means that “dark horses” (of mediocre approval) might win. See the discussion below.

In range voting, the voters grade the candidates like on a report card, and the candidate with the highest grade point average (GPA) wins. There is the tantalizing but empirically perhaps small complexity of the distinction between a 0 grade (included in the GPA) and a blank vote (not included in the GPA).

  • In economics this reflects the notion of cardinal utility (with voters restricted to the same range).
  • Strategic voters will give a low score to candidates that harm their best candidate (even the second best), which means that the system devolves into plurality. (The use of ordinal preferences and Borda explicitly intends to resolve this again.)

(See also the distinction in levels of measurement.)

Beware of the distinction between cardinal and ordinal preferences

Arrow’s impossibility theorem is about aggregating individual rank orders into a collective rank order. The theorem uses rank orders, or ordinal preferences. Arrow does more than only use rankings. He also defends the “axiom of pairwise decision making” (APDM) a.k.a. the “axiom of independence of irrelevant alternatives” (AIIA) as reasonable and morally desirable (Palgrave Dictionary of Economics).

Range voting allows more information than just ordinal preferences, and it is similar to cardinal preferences (but limiting people to the same range). Cardinal preferences imply ordinal preferences. Yet rank voting doesn’t satisfy the requirements of Arrow’s impossibility theorem, for cardinality violates APDM or AIIA.

One might say that Arrow’s theorem is not about voting systems in general, since it only looks at ordinal and not at cardinal preferences. Instead, Arrow’s position is that he looks at voting theory in general and only proposes axioms that are “reasonable” and “morally desirable”  When cardinality and range voting are excluded from his axioms, then it is because they would be unreasonable or morally undesirable.

These distinctions are discussed – and Arrow’s notions are debunked – in my “Voting theory for democracy” (VTFD). (See especially chapter 9.2 on page 239.)

Arrow’s theorem is only about those voting systems that satisfy his axioms. Since his axioms cause an inconsistency, there is actually no system that matches his conditions. Something that doesn’t exist cannot be reasonable and morally desirable. Arrow’s theorem confuses voting results with decisions, see this earlier weblog discussion.

However, there still remains an issue for voting theory. Range voting allows more scope for strategic voting or manipulation. The reason to restrict votes to rank orders is to reduce the scope for strategic voting.

Gerry Mackie’s “Democracy Defended

A reader alerted me to Gerry Mackie’s thesis with Jon Elster, now commercially available as “Democracy Defended“. I haven’t read this but the blurb seems to confirm what I have been arguing since 1990 on Arrow (but not on Riker).

“Is there a public good? A prevalent view in political science is that democracy is unavoidably chaotic, arbitrary, meaningless, and impossible. Such scepticism began with Condorcet in the eighteenth century, and continued most notably with Arrow and Riker in the twentieth century. In this powerful book, Gerry Mackie confronts and subdues these long-standing doubts about democratic governance. Problems of cycling, agenda control, strategic voting, and dimensional manipulation are not sufficiently harmful, frequent, or irremediable, he argues, to be of normative concern. Mackie also examines every serious empirical illustration of cycling and instability, including Riker’s famous argument that the US Civil War was due to arbitrary dimensional manipulation. Almost every empirical claim is erroneous, and none is normatively troubling, Mackie says. This spirited defence of democratic institutions should prove both provocative and influential.” (Cover text of “Democracy Defended“)

My point however would be that issues of cycling are of concern, like we see with the Brexit referendum question. The concern causes support for representative democracy with proportional representation, rather than populism with referenda.

The key context is switching to parliaments with PR

Discussions about voting theory best be seen in the context of the switch towards parliaments that are elected with PR and that select the prime minister. The president may have a cerimonial role and be elected by parliament too (like in Germany).

It is most democratic when there is proportional representation (PR) of the electorate in the elected body. The more complex voting methods can then be used by the professionals in the elected body itself only. A prime minister is best elected by a parliament with PR, instead of a president by direct elections.

The interview with Arrow contains a criticism on plurality and FPTP compared to PR.

“Dr. Arrow: Yes. I think definitely. I think there’s no question about that. The Plurality system chokes off free entry. In other words, in the economic world we’re accustomed to the virtues of free entry. We don’t want a small number of corporations to be dominate. We favor the idea of new firms entering in order to compete to bring in new ideas, to bring in new products. Well, the same way in the political field. We should be encouraging free entry, I think, in order to have new political ideas come in. And they may flourish. They may fade. That’s what you want, them to be available. So I’m inclined that the Plurality system will choke off by encouraging, the two-party system will choke off new entry. So I’m really inclined to feel that we don’t want Plurality as a voting system. It’s likely to be very stifling.”

“(…) proportional representation [PR] plays very little role in The United States, but they do play a role in a number of countries. And the question of whether single-member districts are appropriate or not. The Germans, for example, have some kind of compromise between single-member and broader districts. (…)”

See my comparison between the Dutch PR and the UK district system.

Proposals that assume that the voters themselves would use the complexer voting systems – perhaps an enlightened form of populism – are complicating election reform, because these methods put too high demands upon the voters and the electoral process.

In the interview, Arrow referred to the proportional systems, but still expressed the idea that voters themselves would use the three or four categories. In this manner Arrow contributed to this confusion on context.

“CES: If you could, just sort of dictatorially, change something about the way that we do voting in the US, something that would make the biggest impact in your mind, what do you think you would do?

Dr. Arrow: The first thing that I’d certainly do is go to a system where people ranked all the candidates, or as many as they wish, and not just two. And that these data are used in some form or another to choose the candidate, say by eliminating the lowest, or some method of that kind. I’d be interested in experimenting with the idea of categorization and creating interpersonal comparisons by that. And those are the things that I would argue for, and certainly the abolition of the Electoral College. It goes without saying.”

In my experience Arrow is often more confused than one would expect. (1) His original theorem confused voting outcomes and decisions. (2) If he really assumed that people would vote sincerely, then he might as well have assumed cardinality, but he didn’t, for then he wouldn’t have had a theorem. (3) He made a theorem on ordinal preferences but now is inclined to cardinality, even though he defends his theorem that cardinality would be unreasonable and morally undesirable since it doesn’t satisfy APDM a.k.a. AIIA. (4) He now mentions PR but doesn’t draw the conclusion of the selection of the prime minister by parliament, and apparently still thinks in terms of a direct election of the president.

Arrow’s contributions to economics derive from the application of mathematics to economics in the 1950s, and not because he was exceptionally smart in economics itself. Paul Samuelson expressed this idea about himself once too, as a physicist entering into economics. If Arrow had been real smart then he also would have had the common sense to see that his theorem confuses voting results and decisions, and that it amounts to intellectual fraud to pretend that it is more than that.

A major issue is that abstract thinking mathematicians can get lost about reality. In VTFD I show that Amartya Sen is confused about his theorem about a Paretian liberal. Sen’s article with Eric Maskin in the NY Book Review about electoral reform also neglects the switch to a parliamentarian system with PR. A major problem in society is that many intellectuals have insufficient background in mathematics and follow such lost mathematicians without sufficient criticism, even when common sense would warn them.

Warren Smith’s parable of the bees

Warren Smith suggests that bees also use range voting to select the next location for their hive. My problem is that bees aren’t known for strategic voting. My VTFD already suggested – as Jan Tinbergen – that aggregation of cardinal utility would be best indeed. Thus I don’t feel the need to check how bees are doing it.

The problem in voting theory is that humans can vote strategically, also guarded by secrecy in the ballot box. Potentially this strategic vote might be less of a problem when votes for the prime minister in parliament are made public, so that people can wonder why a party has a particular vote. But transparency of the vote might not be the key issue.

Smith on Bayesian regret

Smith has a notion of Bayesian regret, as a more objective criterion to judge voting systems. I am amazed by the existence of such a notion for social optimality and haven’t looked into this yet.

Smith is too enthousiastic about Arrow’s support

Smith interpretes Arrow’s “a little inclined to think” as an endorsement for range voting.  Smith provides full quotes properly – and I must thank him for directing me to this interview with Arrow. But I would advise Smith to be more critical. Arrow mainly indicates an inclination, he is also confused and doesn’t repeal his interpretation of his theorem. Also Smith is advised to grow aware and alert readers of his website that the real improvement in democracy lies not in range voting but in a switch to a prime minister selected by a PR parliament. It is another issue how voting mechanisms operate in other situations, like the Eurovision Song Contest.

Smith’s discussion of the dark horse and the war of the clones

To reduce the options for strategic voting, the voters can be restricted to the use of rankings, and then we get systems like Borda, Condorcet, or my suggestion of the Borda Fixed Point method (BordaFP). The latter wasn’t designed to be a compromise between Borda and Condorcet but still can be seen as one. For example, in the 2010 general elections in the UK, with David Cameron, Gordon Brown and Nick Clegg, it appears that Clegg would be a Borda Choice, but Cameron would still be the BordaFP choice because he would beat Clegg in a pairwise contest.

The reader would enjoy Smith’s discussion of the dark horse and the war of the clones, in his criticism of the Borda method. There is no need for me to repeat his short statement, and I simply refer to here. While you are reading, there is also a picture of Frisian horse Fokke of 2013, and we continue the discussion below it. This discussion is not in VTFD since I mainly pointed to strategic voting but didn’t develop the argument, and thus I thank Smith for his succinct criticism.

 

Frisian Fokke 2013

War of the clones

This assumes the Borda system. Smith (point 8) compares the election between Mush (51%) and Bore (49%) with the election between Mush and some clones Bore1, Bore2, Bore3 (leaving unclear who the real Bore is). Supposedly it is publically known that Mush selects Bore1 in second place, so that the Bores can collect all their votes on Bore1 too. Now Mush loses. This criticism is accurate.  With Condorcet’s rule, Mush would beat all Bores, but the idea of Borda is to mitigate Condorcet. With enough Bores, the BordFP method is not immune to this either.

In above key context, the method would not be applied by the whole electorate but only by parliament. The number of parties would be limited, and each party would only mention one candidate. In the current Dutch parliament there are 13 parties, see Bloomberg with a graphical display of the political spectrum and my analysis on an application of BordaFP. Here the problem doesn’t really arise.

In general people might feel that parties and their candidates differ. If not, then this would require attention. For applications of Borda or BordaFP to smaller committees, it would be sensible to be aware of this. Committees might devise rules about when candidates are too much alike, bunch their votes as if they were one (and rerank), and only call for a decision vote between the clones when they would actually be chosen.

The dark horse

Smith (point 2) considers candidates A, B, C and various nonentities. Kenneth Arrow used the more polite term “irrelevant alternatives”. Let me settle for Dark Horse D. Let me also distinguish truthful voting and strategic voting. In a truthful vote there is no difference between the true preference and the ranking submitted to the ballot box. In a strategic vote there is the strategy provided by the truth and the tactic vote submitted to the box. (Potentially one might design a voting system in which a voter submits those two rank orders simultaneously, but then we must relabel between truth and those two submissions.)

A member of parliament (MP) faces a dilemma. If the MP prefers A > B > C > D then giving the ranks 4, 3, 2, 1 will give 3 points to B, which might cause that B is chosen instead of A. This MP has the incentive to shift points to the Dark Horse, as in 4, 1, 2, 3, hoping that nobody else will vote for this dark horse anyway. If all MPs think in this manner, then the Dark Horse will be elected with an impressive score.

Smith provides an anecdote how such an event happened in the selection of a job application, where there was disagreement about an excellent macro-economist and an excellent micro-economist, whereupon a mediocre candidate got the job.

This is the prisoners’ dilemma. (1) If everyone votes truthfully then they all benefit from the true selection. (2) If everyone votes strategically then they all suffer the worst outcome. (3) Each has an incentive to deflect from the true vote.

The BordaFP method is sturdier than Borda but is not immune to this situation.

A prime answer to Smith is that in parliament the rankings for the selection of the prime minister might be public, so that voters and the press can question party tactics. A party that gives so much points to a Dark Horse might be criticised for not appreciating a better candidate.

Looking for balance

For now, I find Smith’s discussion a bit unbalanced. He emphasizes the disadvantages of Borda, but these have the answers above, for the proper context, while the disadvantages of range voting don’t get as much attention. Range voting stimulates the strategy of giving zero points to alternative candidates, whence it reduces to plurality with all its drawbacks. A candidate with 51% of the vote in plurality might not be better, since more extremist, than a candidate with a higher Borda score who is more moderate. The main point remains that the key issue is that countries with district voting like the USA, UK and France better switch to PR.

By way of conclusion

It remains true that Borda has the risk of a Dark Horse, and that the search for better algorithms is open. How can we elicit information from voters about their true preferences ? In the ballot box we might numb their brains so that they vote like bees (perhaps also with the dance) ?

An idea that I already mentioned at another place: MPs might submit two inputs, one with the strategy (supposed to be true) and one with the intended tactic. (One would design a test whether these better be rankings or ranges.) The intermediate result would be based upon the tactics. A random selection of the true preferences then is used to revise the tactics to improve the results for those MPs who have the luck to be selected. This prospect encourages MPs to be truthful about the strategy.

Another possibility for such double submissions: One might first determine the outcome according to the submitted strategies (supposedly true) and then use a random selection to use the allowed tactics, and only uses these if they indeed cause an improvement in the eyes of the MP. This sanctions a moderate degree of unavoidable strategic voting, but reduces the chaos when all do it without information about others.

Such calculations are simple for a partial outcome for a single MP. The problem lies in the aggregation of all MPs. Perhaps money helps in solving this too. Voters in the electorate aren’t allowed to sell their vote directly, with the obvious horror stories, also involving the distribution of income. But in parliament there is coalition bargaining which involves money, i.e. budget allocations. Potentially this helps in designing better algorithms. Perhaps the Bayesian Regret comes into play here, but I haven’t checked this. In Holland there is professor Frans Stokman who studies coalition bargaining with his “Decide” model.

Thus the search for better voting schemes hasn’t ended. Yet the main step for the USA, UK and France would be to accept the choice of a prime minister by parliament selected by PR.

The Theresa May government has adopted Brexit as its policy aim and has received support from the Commons. Yet, economic theory assumes rational agents, and even governments might be open for rational reconsideration, even at the last moment.

Scientifically unwarranted referendum question

Based upon voting theory, the Brexit referendum question can be rejected as scientifically unwarranted. My suggestion is that the UK government annuls the outcome based upon this insight from science, and upon this insight alone. Let me invite (economic) scientists to study the argument and voting theory itself, so that the scientific community can confirm this analysis. This study best be done all over Europe, so that also the EU Commission might adopt it. Britons might be wary when their government or the EU Commission would listen to science, but then they might check the finding themselves too. A major worry is why the UK procedures didn’t produce a sound referendum choice in the first place.

Renwick et al. (2016) in an opinion in The Telegraph June 14 protested:

“A referendum result is democratically legitimate only if voters can make an informed decision. Yet the level of misinformation in the current campaign is so great that democratic legitimacy is called into question.”

Curiously, however, their letter doesn’t make the point that the referendum neglects voting theory, since the very question itself is misleading w.r.t. the complexity of the issue under decision. Quite unsettling is the Grassegger & Krogerus (2017) report about voter manipulation by Big Data, originally on Brexit and later for the election of Donald Trump. But the key point here concerns the referendum question itself.

The problem with the question

The question assumes a binary choice – Remain or Leave the EU – while voting theory warns that allowing only two options can be a misleading representation. When the true situation is more complex, then it may be political manipulation to reduce this to a binary one. As a result of the present process, we actually don’t know how people would have voted when they had been offered the true options.

Compare the question:

“Do you still beat your mother ?”

When you are allowed only a Yes or No answer, then you are blocked from answering:

“I will not answer that question because if I say No then it suggests that I agree that I have beaten her in the past.”

In the case of Brexit, the hidden complexity concerned:

  • Leave as EFTA or WTO ?
  • Leave, while the UK remains intact or while it splits up ?
  • Remain, in what manner ?

Voting theory generally suggests that representative democracy – Parliament – is better than relying on referenda, since the representatives can bargain about the complex choices involved.

Deadlocks can lurk in hiding

When there are only two options then everyone knows about the possibility of a stalemate. This means a collective indifference. There are various ways to break the deadlock: voting again, the chairperson decides, flip a coin, using the alphabet, and so on. There is a crucial distinction between voting (vote results) and deciding. When there are three options or more there can be a deadlock as well. It is lesser known that there can also be cycles. It is even lesser known that such cycles actually are a disguised form of a deadlock.

Take for example three candidates A, B and C and a particular distribution of preferences. When the vote is between A and B then A wins. We denote this as A > B. When the vote is between B and C then B wins, or B > C. When the vote is between C and A then C wins or C > A. Collectively A > B > C > A. Collectively, there is indifference. It is a key notion in voting theory that there can be distributions of preferences, such that a collective binary choice seems to result into a clear decision, while in reality there is a deadlock in hiding.

Kenneth Arrow (1921-2017) who passed away on February 21 used these cycles to create his 1951 “impossibility theorem”. Indeed, if you interprete a cycle as a decision then this causes an inconsistency or an “impossibility” w.r.t. the required transitivity of a (collective) preference ordering. However, reality is consistent and people do really make choices collectively, and thus the proper interpretation is an “indifference” or deadlock. It was and is a major confusion in voting theory that Arrow’s mathematics are correct but that his own verbal interpretation was incorrect, see my VTFD Ch. 9.2.

Representative government is better than referenda

Obviously a deadlock must be broken. Again, it may be manipulation to reduce the choice from three options A, B and C to only two. Who selects those two might take the pair that fits his or her interests. A selection in rounds like in France is no solution. There are ample horror scenarios when bad election designs cause minority winners. Decisions are made preferably via discussion in Parliament. Parliamentarian choice of the Prime Minister is better than direct election like for the US President.

Voting theory is not well understood in general. The UK referendum in 2011 on Proportional Representation (PR) presented a design that was far too complex. Best is that Parliament is chosen in proportional manner as in Holland, rather than in districts as in the UK or the USA. It suffices when people can vote for the party of their choice (with the national threshold of a seat), and that the professionals in Parliament use the more complexer voting mechanisms (like bargaining or the Borda Fixed Point method). It is also crucial to be aware that the Trias Politica model for democracy fails and that more checks and balances are required, notably with an Economic Supreme Court.

The UK Electoral Commission goofed too

The UK Electoral Commission might be abstractly aware of this issue in voting theory, but they didn’t protest, and they only checked that the Brexit referendum question could be “understood”. The latter is an ambiguous notion. People might “understand” quite a lot but they might not truly understand the hidden complexity and the pitfalls of voting theory. Even Nobel Prize winner Kenneth Arrow gave a problematic interpretation of his theorem.The Electoral Commission is to be praised for the effort to remove bias, where the chosen words “Remain” and “Leave” are neutral, and where both statements were included and not only one. (Some people don’t want to say No. Some don’t want to say Yes.) Still, the Commission gives an interpretation of the “intelligibility” of the question that doesn’t square with voting theory and that doesn’t protect the electorate from a voting disaster.

A test on this issue is asking yourself: Given the referendum outcome, do you really think that the UK population is clear in its position, whatever the issues of how to Leave or risk of a UK breakup ? If you have doubts on the latter, then you agree that something is amiss. The outcome of the referendum really doesn’t give me a clue as to what UK voters really want. Scotland wants to remain in the EU and then break up ? This is okay for the others who want to Leave ? (And how ?) The issue can be seen as a statistical enquiry into what views people have, and the referendum question is biased and cannot be used for sound statistics.

In an email to me 2016-07-11:

“The Electoral Commission’s role is to evaluate the intelligibility of referendum questions in line with the intent of Parliament; it is not to re-evaluate the premise of the question. Other than that, I don’t believe there is anything I can usefully add to our previously published statements on this matter.”

Apparently the Commission knows the “intent of Parliament”, while Parliament itself might not do so. Is the Commission only a facilitator of deception, and they don’t have the mission to put voters first ? At best the Commission holds that Whitehall and Parliament fully understood voting theory therefor deliberatedly presented the UK population with a biased choice, so that voters would be seduced to neglect complexities of how to Leave or the risks of a UK breakup. Obviously the assumption that Whitehall and Parliament fully grasp voting theory is dubious. The better response by the Commission would have been to explain the pitfalls of voting theory and the misleading character of the referendum question, rather than facilitate the voting disaster.

Any recognition that something is (very) wrong here, should also imply the annulment of the Brexit referendum outcome. Subsequently, to protect voters from such manipulation by Whitehall, one may think of a law that gives the Commission the right to veto a biased Yes / No selection, which veto might be overruled by a 2/3 majority in Parliament. Best is not to have referenda at all, unless you are really sure that a coin can only fall either way, and not land on its side.

Addendum March 31

  • The UK might repeal the letter on article 50 – see this BBC reality check. Thus science might have this time window to clarify to the general public how the referendum question doesn’t comply with voting theory.
  • The recent general elections in Holland provide another nice example for the importance of voting theory and for the meaning of Arrow’s Impossibility Theorem, see here.
Literature

BBC (2017), “Article 50: May signs letter that will trigger Brexit“, March 29

Carrell, S. (2017), “Scottish parliament votes for second independence referendum“, The Guardian, March 28

Colignatus (2001, 2004, 2011, 2014), “Voting theory for democracy” (VTFD), pdf online, https://zenodo.org/record/291985

Colignatus (2010, “Single vote multiple seats elections. Didactics of district versus proportional representation, using the examples of the United Kingdom and The Netherlands”, May 19 2010, MPRA 22782, http://mpra.ub.uni-muenchen.de/22782

Colignatus (2011a), “The referendum on PR“, Mathematics Teaching 222, January 5 2011, also on my website

Colignatus (2011b), “Arrow’s Impossibility Theorem and the distinction between Voting and Deciding”, https://mpra.ub.uni-muenchen.de/34919

Colignatus (2014), “An Economic Supreme Court”, RES Newsletter issue no. 167, October 2014, pp.20-21, http://www.res.org.uk/view/art7Oct14Features.html

Colignatus (2016), “Brexit: advice for young UK (age < 50 years), and scientific outrage for neglect of voting theory“, weblog text June 29

Colignatus (2017), “The performance of four possible rules for selecting the Prime Minister after the Dutch Parliamentary elections of March 2017″, March 17, MPRA 77616

Grassegger, H. and M. Krogerus (2017), “The Data That Turned the World Upside Down”, https://motherboard.vice.com/en_us/article/how-our-likes-helped-trump-win

Renwick, A. e.a. (2016), “Letters: Both Remain and Leave are propagating falsehoods at public expense“, The Telegraph, Opinion, June 14

From the BBC website

[ This is the same text as the former weblog (here), but now we follow Van Hiele’s argument for the abolition of fractions. The key property is that there are numbers xH such that x xH = 1 when x ≠ 0, and the rest follows from there. Thus we replace (y / x) with y xH with H = -1. ]

Robert Siegler participates in the “Center for Improved Learning of Fractions” (CILF) and was chair of the IES 2010 research group “Developing Effective Fractions Instruction for Kindergarten Through 8th Grade” (report) (video).

IES 2010 key advice number 3 is:

“Help students understand why procedures for computations with fractions make sense.”

The first example of this helping to understand is:

“A common mistake students make when faced with fractions that have unlike denominators is to add both numerators and denominators. [ref 88] Certain representa­tions can provide visual cues to help students see the need for common denominators.” (Siegler et al. (2010:32), refering to Cramer, K., & Wyberg, T. (2009))

For a bH “and” c dH kids are supposed to find (a d + b c) (b d)H instead of (a + c) (b + d)H.

Obviously this is a matter of definition. For “plus” we define: a bH + c dH = (a d + b c) (b d)H.

But we can also define “superplus”: a bHc dH = (a + c) (b + d)H.

The crux lies in “and” that might not always be “plus”.

When (a + c) (b + d)H makes sense

There are cases where (a + c) (b + d)H makes eminent sense. For example, when a bH is the batting average in the Fall-Winter season and c dH the batting average in the Spring-Summer season, then the annual (weighted) batting average is exactly (a + c) (b + d)H. Kids would calculate correctly, and Siegler et al. (2010) are suggesting that the kids would make a wrong calculation ?

The “superplus” outcome is called the “mediant“. See a Wolfram Demonstrations project case with batting scores.

Adding up fractions of the same pizza thus differs from averaging over more pizzas.

We thus observe:

  • Kids live in a world in which (a + c) (b + d)H makes eminent sense.
  • Telling them that this is “a mistaken calculation” is actually quite confusing for them.
  • Thus it is better teaching practice to explain to them when it makes sense.

There is no alternative but to explain Simpson’s paradox also in elementary school. See the discussion about the paradox in the former weblog entry. The issue for today is how to translate this to elementary school.

[ Some readers may not be at home in statistics. Let the weight of b be w = b (b + d)H. Then the weight of d is 1 – w. The weighted average is (a bH) w + (c dH) (1 – w) = (a + c) (b + d)H. ]

Cats and Dogs

Many examples of Simpson’s paradox have larger numbers, but the Kleinbaum et al. (2003:277) “ActivEpi” example has small numbers (see also here). I add one more to make the case less symmetrical. Kady Schneiter rightly remarked that an example with cats and dogs will be more appealing to students. She uses animal size (small or large pets) as a factor, but let me stick to the idea of gender as a confounder. Thus the kids in class can be presented with the following case.

  • There are 17 cats and 16 dogs.
  • There are 17 pets kept in the house and 16 kept outside.
  • There are 17 female pets and 16 male pets (perhaps “helped”).

There is the phenomenon – though kids might be oblivious why this might be “paradoxical”:

  1. For the female pets, the proportion of cats in the house is larger than the proportion for dogs.
  2. For the male pets, the proportion of cats in the house is larger than the proportion for dogs.
  3. For all pets combined, the proportion of cats in the house is smaller than the proportion for dogs.
The paradoxical data

The paradoxical data are given as follows. Observe that kids must calculate:

  • For the cats: 6 7H = 0.86, 2 10H = 0.20 and (6 + 2) (7 + 10)H = 0.47.
  • For the dogs: 8 10H = 0.80, 1 6H = 0.17 and (8 + 1) (10 + 6)H = 0.56.

A discussion about what this means

Perhaps the major didactic challenge is to explain to kids that the outcome must be seen as “paradoxical”. When kids might not have developed “quantitative intuitions” then those might not be challenged. It might be wise to keep it that way. When data are seen as statistics only, then there might be less scope for false interpretations.

Obviously, though, one would discuss the various views that kids generate, so that they are actively engaged in trying to understand the situation.

The next step is to call attention to the sum totals that haven’t been shown above.

It is straightforward to observe that the F and M are distributed in unbalanced manner.

The correction

It can be an argument that there should be equal numbers of F and M. This causes the following calculations about what pets would be kept at the house. We keep the observed proportions intact and raise the numbers proportionally.

  • For the cats: 0.86 * 10 ∼ 9, and (9 + 2) (10 + 10) H = 0.55.
  • For the dogs: 0.17 * 10 ∼ 2, and (8 + 2) (10 + 10) H = 0.50.

And now we find: Also for all pets combined, the proportion of cats in the house is larger than the proportion for dogs. Adding up the subtables into the grand total doesn’t generate a different conclusion on the proportions.

Closure on causality

Perhaps kids at elementary school should not bothered with discussions on causality, certainly not on a flimsy case as this. But perhaps some kids require closure on this, or perhaps the teacher does. In that case the story might be that the kind of pet is the cause, and that the location where the pet is kept is the effect. When people have a cat then they tend to keep it at home. When people have a dog then are a bit more inclined to keep it outside. The location has no effect on gender. The gender of the pet doesn’t change by keeping it inside or outside of the house.

Vectors in elementary school

Pierre van Hiele (1909-2010) explained for most of his professional life that kids at elementary school can understand vectors. Thus, they should be able to enjoy this vector graphic by Alexander Bogomolny.

Van Hiele also proposed to abolish fractions as we know them, by replacing y / x by y x^(-1). The latter might be confusing because kids might think that they have to subtract something. But the mathematical constant H = -1 makes perfect sense, namely, check the unit circle and the complex number i. Thus we get y / x = y xH. The latter would be the better format. See A child wants nice and no mean numbers(2015).

Conclusions

Some conclusions are:

  • What Siegler & IES 2010 call a “common mistake” is the proper approach in serious statistics.
  • Teaching can improve by explaining to kids what method applies when. Adding fractions of the same pizza is different from calculating a statistical average. (PM. Don’t use round pizza’s. This makes for less insightful parts.)
  • Kids live in a world in which statistics are relevant too.
  • Simpson’s paradox can be adapted such that it may be tested whether it can be discussed in elementary school too.
  • The discussion corroborates Van Hiele’s arguments for vectors in elementary school and the abolition of fractions as we know them (y / x) and the use of y xH with H = -1. The key thing to learn is that there are numbers xH such that x xH = 1 when x ≠ 0, and the rest follows from there.

PM. The excel sheet for this case is: 2017-03-03-data-from-kleinbaum-2003-adapted

Hans Rosling (1948-2017) was a professor of public health and at the Swedish Academy of Sciences. I hadn’t heard about him but his death caused newsmedia to report about his mission to better inform people by the innovative presentation of statistics. I looked at some of his presentations, and found them both informative and innovative indeed.

I applaud this chart in which he tabulates not only causes and effects but rather means and goals. (Clicking on the picture will bring you to the TED talk 2007, and at the end the audience may applaud for another reason, namely when he swallows a sword to illustrate that the “impossible is possible”.)

Hans Rosling 1948-2017

Hans Rosling 1948-2017

Continue the discussion

My impression is that we best honour Rosling by continuing the discussion about his work. Thus, my comments are as follows.

First of all, my book Definition & Reality in the General Theory of Political Economy shows that the Trias Politica model of democracy fails, because it allows politicians still too much room to manipulate information and to meddle in scientific advice on policy making. Thus, governance is much more important than Rosling suggested. Because of his analysis, Rosling in some of his simulations only used economic growth as the decisive causal factor to explain the development of countries. However, the key causal factor is governance. The statistical reporting on this is not well developed yet. Thus, I move one + from economic growth to governance.

Secondly, my draft book The Tinbergen & Hueting Approach in the Economics of Ecological Survival discusses that the environment has become a dominant risk for the world as we know it. It is not a mathematical certainty that there will be ecological collapse, but the very nature of ecological collapse is that it comes suddenly, when you don’t expect it. The ecology is so complex and we simply don’t have enough information to manage it properly. It is like standing at the edge of a ravine. With superb control you might risk to edge one millimeter closer, but if you are not certain that the ground will hold and that there will not be a sudden rush of wind, then you better back up. The table given by Rosling doesn’t reflect this key point. Thus, I move one + from economic growth to the environment.

In sum, we get the following adapted table.

Adapted from Hans Rosling

I have contemplated for the means whether I would want to shift another + from economic growth to either human rights (property rights) or education (I am also a teacher). However, my current objective is to highlight the main analytical difference only.

In the continued discussion we should take care of proper definitions.

What does “economic growth” mean ?

The term “economic growth” is confusing. There is a distinction between level and annual growth of income, and there is a distinction w.r.t. categories within. Economic welfare consists of both material products (production and services) and immaterial elements (conditions and services). If the term “economic growth” includes both then this would be okay. In that case, however, the whole table would already be included in the notion of welfare and economic growth. Apparently, Hans Rosling intended the term “economic growth” for the material products. I would suggest to replace his “economic growth” by “income level”, and thus focus on both income and level rather than annual change of a confusingly named statistic. Obviously, it is a policy target that all people would have a decent standard of living, but it is useful to remain aware that income is only a means to a higher purpose, namely to live a good life.

PM. This causes a discussion about the income distribution, and how the poor and the rich refer to each other, so that the notion of poverty is relative to the general standard of society. In the 1980s the computer was a luxury item and nowadays a cell-phone with larger capacity is a necessity. These are relevant aspects but a discussion would lead too far here now.

What does “environment” mean ?

In the adapted table, the environment gets ++ as both means and goal. There is slight change of meaning for these separate angles.

  • The environment as a goal means that we want to preserve nature for our descendants. Our kids and grandchildren should also have tigers and whales in their natural habitat, and not as photographs only.
  • The environment as means causes some flip-flop thinking.
    (1) In economic thought, everything that exists either already existed or mankind has crafted it from what was given. Thus we only have (i) the environment, (ii) human labour. There are no other means available. From this perspective the environment deserves +++.
    (2) For most of its existence (some 60,000 years), mankind took the environment for granted. Clear air and water where available, and if some got polluted it was easy to move to a next clean spot. The economic price of the environment was zero. (Or close to it: the cost of moving was not quite a burden or seen as an economic cost.) Thus, as a means, the environment didn’t figure, and from this viewpoint it deserves a 0. There are still many people who think in this manner. It might be an engrained cultural habit, but a rather dangerous one.
    (3) Perhaps around the middle of the past century, the 1950s, the environment has become scarce. As Lionel Robbins explained: the environment has become an economic good. The environment provides functions for human existence and survival, and those functions now get a price. Even more, the Tinbergen & Hueting approach acknowledges that the ecology has become risky for human survival. The USA and Europe might think that they can outsource most environmental pollution to the poorer regions of the world, but when the rain forests turn into deserts and when the CO2 turns the oceans into an acid soup that eats away the bones of fish, then the USA and Europe will suffer the consequences too. In that perspective, the environment deserves +++.
    (4) How can we make sure that the environment gets proper place in the framework of all issues ? Eventually, nature is stronger than mankind, and there might arise some natural correction. However, there is also governance. If we get our stuff together, then mankind might manage the world economy, save the environment at some cost, but still achieve the other goals. Thus governance is +++ and the environment is relative at ++. Thus we arrive at above adapted table.
Dynamic simulation

As a teacher of mathematics I emphasize the combined presentation of text, formula, numeric table, and graph. By looking at these different angles, there is greater scope for integrated understanding. Some students are better at single aspects, but by presenting the four angles you cover the various types of students, and all students get an opportunity to develop the aspects that they are weaker in.

Obviously, dynamic simulation is a fifth aspect. See for example the Wolfram Demonstrations project. Many have been making applets in Java and embedding this in html5, yet the use of Mathematica would allow for more exchangeable and editable code and embedding within educational contexts in which the manipulation of text, formula, numeric table, and graph would also be standard.

Obviously, role playing and simulation games are a sixth aspect. This adds human interaction and social psychology to the learning experience. Dennis Meadows has been using this to allow people to grow aware of the risk on the environment, see e.g. “Stratagem” or MIT-Sloan.

The economic crisis of 2007+

What I particularly like about Rosling’s table is his emphasis on culture as a goal. Artists and other people in the world of culture will already be convinced of this – see also Roefie Hueting on the jazz stage – yet others may not be aware that mankind exists by culture.

There is also an important economic angle on culture as a means. In recessions and depressions, the government can stimulate cultural activity, such that money starts flowing again with much less risk for competitive conditions. That is, if the government would support the automobile industry or steel and do specific investments, then this might favour some industries or services at the cost of others, and it might affect competitive conditions overall, and even insert imbalances into the economy in some structural manner. Yet stimulating cultural activity might be much more neutral and still generate an economic stimulus.

For example, Germany around 1920 got into economic problems and the government responded by printing more money, and this caused the hyperinflation. This experience got ingrained in the German attitude towards monetary issues. In the Eurozone Germany follows the hard line that inflation should be prevented at all costs. Thus the eurozone now has fiat money that still functions as a gold standard because of the strict rules. (See my paper on this.) By comparison, when the USA around 1930 got into economic problems and the central bank was hesitant to print money (no doubt looking at the German example), this eventually caused the Great Depression. Thus monetary policy has the Scylla and Charybdis character, with the risks of either too little or too much. Potentially, the option to organise cultural activity would be a welcome addition to the instruments to avoid such risks and smooth the path towards recovery.

I am not quite suggesting that the ECB should print money to pay the unemployed in Greece, Italy, Spain and Portugal to make music and dance in the streets, yet, when the EU would invest in musea and restorations and other cultural services so that Northern Europe can better enjoy their vacations in Southern Europe, then this likely would be more acceptable than when such funds would be invested directly in factories that start to compete with the North. The current situation that Southern Europe has both unemployment and less funds to maintain the cultural heritage is obviously less optimal.

The point is also made in my book Common Sense: Boycott Holland. Just to be sure: this notion w.r.t. culture is not the main point of CSBH. It is just a notion that is worthy of mentioning.

PM. Imagine a dynamic simulation of restoring the Colosseum. Or is it culturally more valuable as a ruin than fully restored ?

By Jaakko Luttinen - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=22495158

By Jaakko Luttinen – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=22495158

Robert Siegler participates in the “Center for Improved Learning of Fractions” (CILF) and was chair of the IES 2010 research group “Developing Effective Fractions Instruction for Kindergarten Through 8th Grade” (report) (video).

IES 2010 key advice number 3 is:

“Help students understand why procedures for computations with fractions make sense.”

The first example of this helping to understand is:

“A common mistake students make when faced with fractions that have unlike denominators is to add both numerators and denominators. [ref 88] Certain representa­tions can provide visual cues to help students see the need for common denominators.” (Siegler et al. (2010:32), refering to Cramer, K., & Wyberg, T. (2009))

For a / b “and” c / d kids are supposed to find (ad + bc) / (bd) instead of (a + c) / (b + d).

Obviously this is a matter of definition. For “plus” we define: a / b + c / d = (ad + bc) / (bd).

But we can also define “superplus”: a / c / d =  (a + c) / (b + d).

The crux lies in “and” that might not always be “plus”.

When (a + c) / (b + d) makes sense

There are cases where (a + c) / (b + d) makes eminent sense. For example, when a / b is the batting average in the Fall-Winter season and c / d the batting average in the Spring-Summer season, then the annual (weighted) batting average is exactly (a + c) / (b + d). Kids would calculate correctly, and Siegler et al. (2010) are suggesting that the kids would make a wrong calculation ?

The “superplus” outcome is called the “mediant“. See a Wolfram Demonstrations project case with batting scores.

Adding up fractions of the same pizza thus differs from averaging over more pizzas.

We thus observe:

  • Kids live in a world in which (a + c) / (b + d) makes eminent sense.
  • Telling them that this is “a mistaken calculation” is actually quite confusing for them.
  • Thus it is better teaching practice to explain to them when it makes sense.

There is no alternative but to explain Simpson’s paradox also in elementary school. See the discussion about the paradox in the former weblog entry. The issue for today is how to translate this to elementary school.

Cats and Dogs

Many examples of Simpson’s paradox have larger numbers, but the Kleinbaum et al. (2003:277) “ActivEpi” example has small numbers (see also here). I add one more to make the case less symmetrical. Kady Schneiter rightly remarked that an example with cats and dogs will be more appealing to students. She uses size (small or large pets) as a factor, but let me stick to the idea of gender as a confounder. Thus the kids in class can be presented with the following case.

  • There are 17 cats and 16 dogs.
  • There are 17 pets kept in the house and 16 kept outside.
  • There are 17 female pets and 16 male pets (perhaps “helped”).

There is the phenomenon – though kids might be oblivious why this might be “paradoxical”:

  1. For the female pets, the proportion of cats in the house is larger than the proportion for dogs.
  2. For the male pets, the proportion of cats in the house is larger than the proportion for dogs.
  3. For all pets combined, the proportion of cats in the house is smaller than the proportion for dogs.
The paradoxical data

The paradoxical data are given as follows. Observe that kids must calculate:

  • For the cats: 6 / 7 = 0.86, 2 / 10 = 0.20 and (6 + 2) / (7 + 10) = 0.47.
  • For the dogs: 8 / 10 = 0.80, 1 / 6 = 0.17 and (8 + 1) / (10 + 6) = 0.56.

A discussion about what this means

Perhaps the major didactic challenge is to explain to kids that the outcome must be seen as “paradoxical”. When kids might not have developed “quantitative intuitions” then those might not be challenged. It might be wise to keep it that way. When data are seen as statistics only, then there might be less scope for false interpretations.

Obviously, though, one would discuss the various views that kids generate, so that they are actively engaged in trying to understand the situation.

The next step is to call attention to the sum totals that haven’t been shown above.

It is straightforward to observe that the F and M are distributed in unbalanced manner.

The correction

It can be an argument that there should be equal numbers of F and M. This causes the following calculations about what pets would be kept at the house. We keep the observed proportions intact and raise the numbers proportionally.

  • For the cats: 0.86 * 10 ∼ 9, and (9 + 2) / (10 + 10) = 0.55.
  • For the dogs: 0.17 * 10 ∼ 2, and (8 + 2) / (10 + 10) = 0.50.

And now we find: Also for all pets combined, the proportion of cats in the house is larger than the proportion for dogs. Adding up the subtables into the grand total doesn’t generate a different conclusion on the proportions.

Closure on causality

Perhaps kids at elementary school should not bothered with discussions on causality, certainly not on a flimsy case as this. But perhaps some kids require closure on this, or perhaps the teacher does. In that case the story might be that the kind of pet is the cause, and that the location where the pet is kept is the effect. When people have a cat then they tend to keep it at home. When people have a dog then are a bit more inclined to keep it outside. The location has no effect on gender. The gender of the pet doesn’t change by keeping it inside or outside of the house.

Vectors in elementary school

Pierre van Hiele (1909-2010) explained for most of his professional life that kids at elementary school can understand vectors. Thus, they should be able to enjoy this vector graphic by Alexander Bogomolny.

Van Hiele also proposed to abolish fractions as we know them, by replacing y / x by y x^(-1). The latter might be confusing because kids might think that they have to subtract something. But the mathematical constant H = -1 makes perfect sense, namely, check the unit circle and the complex number i. Thus we get y / x = y xH. The latter would be the better format. See A child wants nice and no mean numbers(2015).

Conclusions

Some conclusions are:

  • What Siegler & IES 2010 call a “common mistake” is the proper approach in serious statistics.
  • Teaching can improve by explaining to kids what method applies when. Adding fractions of the same pizza is different from calculating a statistical average. (PM. Don’t use round pizza’s. This makes for less insightful parts.)
  • Kids live in a world in which statistics are relevant too.
  • Simpson’s paradox can be adapted such that it may be tested whether it can be discussed in elementary school too.
  • The discussion corroborates Van Hiele’s arguments for vectors in elementary school and the abolition of fractions as we know them (y / x) and the use of y xH with H = -1. The key thing to learn is that there are numbers xH such that x xH = 1 when x ≠ 0, and the rest follows from there.

PM. The excel sheet for this case is: 2017-01-30-data-from-kleinbaum-2003-adapted.

Econometrics researches the economy, using mathematical models and statistical data. For me as an econometrician the important relations are given by the causality in economics. The observed causality is put into the model. The model explains what we think that the causal chains are. Statistics can only give correlation. Thus, there is a tension between what is required for economic analysis and what statistics can provide. Different models may meet with the same data, which means that they would be observationally equivalent, yet, they would still be different models with different assumptions on causality.

Judea Pearl in his wonderful book “Causality” (1ste edition 2000, my copy 2007) of which there now is a 2nd edition, took issue with statistics, and looked for a way to get from correlation to causality. His suggestion is the “do”-statement. I am still pondering about this. For now I tend to regard it as manipulating in models with endogeneity and exogeneity of variables. Please allow me my pondering: some issues require time. See here for an earlier suggestion on causality, one on the counterfactual, and one on confounding. Some earlier papers on the 2 x 2 x 2 case are here. Today I want to look a bit at Simpson’s paradox with an eye on education.

The order of presentation in tables

In graphs, the horizontal x axis gives the cause and the vertical y axis gives the effect. For the derivative we look at dy / dx. Thus in numerical tables we better put the y in the top row and the x in the bottom row.

For 2 x 2 tables the lowest row is the sum of the rows above. Since this lowest row better be the cause, we thus better put the cause in vertical columns and the effect in horizontal rows. This seems a bit of a paradox, but see the presentation below.

(This is similar to when we have the true state (disease) (gold standard) vertically and the test statistic (test) in the rows, when we determine the sensitivity and specificity of the test. Check the wikipedia “worked example“, since the main theory is transposed.)

Pearl (2013) “Understanding Simpson’s Paradox” (technical report R-414) has a transposed table. It is better to transpose back. He also mentions the combined group first but it seems better to put this at the end. (PM. A recent discussion by Pearl on Simpson’s paradox is here.)

Pearl’s data example (transposed)

The following are the data from Pearl (2013), the appendix, figure 4, page 10. The data are the count of the individuals involved. Both men and women are treated (cause) or not, and they recover (effect) or not. Since this is a controlled trial, we do not need to look at prevalence and such.

When we divide the effect (row 1) by the total (row 3) then we get the recovery rates (row 4). We do this for the men, women and joint (combined, pooled) data. We find the paradoxical situation:

  • For the men, the treatment causes reduced recovery (0.6 < 0.7).
  • For the women, the treatment causes reduced recovery (0.2 < 0.3).
  • For all combined, the treatment causes improved recovery (0.5 > 0. 4).
Judea Pearl (2013) figure 4

Judea Pearl (2013) figure 4

More models that are statistically equivalent

We may arrange issues in “cause” and “effect”, but the real relations are determined by reality. Data like these might be available for various models. Pearl (2013) figure 1 mentions more models, but let us consider cases (a) and (b). In the above we have been assuming model (a) on the left, with a path from cause to effect Y, in which variable Z (gender) is causally independent. Above data table however would also fit the format of model (b), in which variable Z (blood pressure) would not be independent, and might be confounding issues.

Perhaps the gender is actually confounding the situation in above table too ? The result of the table is so strange that we perhaps must revise our ideas about the causal relations that we have been assuming.

Pearl (2013), part of figure 1

Pearl (2013), part of figure 1

Pearl’s condition for causality

Pearl’s condition for causality is that “the drug has no effect on gender”, see p10 and his formula (7) (with there F rather than here Z). The above data show that there is an effect, or, when we e.g. look at the women, that Pr[Female | Cause] and Pr[Female | No cause] are different, and thus differ from the marginal probability Pr[Female].

In the table above, we compare line (7) of all women with line (11) of all patients. The women are only 25% of all treated patients and 75% of all untreated ones. Perhaps the treatment has no effect on gender, but the data would suggest otherwise.

pearl-analysis-1

It would be sufficient (not necessary) to adjust the subgroup sizes, such that there is “equal representation”. NB. Pearl refers here to the “sure thing principle” apparently formulated by Savage 1954, which condition doesn’t modify the distribution. For us, the condition and proof of equal representation has another relevance now.

Application of the condition gives a correction

Since this is a controlled trial, we can adapt by including more patients, such that the numbers in the different subgroups (rows (3) and (7), below in red) are equal. This involves 40 more patients, namely 20 men in the non-treatment group and 20 women in the treatment group. This generates the following table.

For ease, it is assumed that the conditional probabilities of the subgroups – thus rows (4) and (8) – remain the same, and that the new patients are distributed accordingly. Of course, they might deviate from this, but then we have better data anyway.

pearl-analysis-2

The consequence of including adequate numbers of patients in the subgroups is:

  • Row (13) now shows that Pr[Z | C] = Pr[Z | Not-C ] = Pr[Z], for Z = M or F.
  • As the treatment is harmful in both subgroups, it also is harmful for the pooled group.
Intermediate conclusion

Obviously, when the original data already allow an estimate of the harmful effect, it would not be ethical to subject 20 more women to the treatment – while it might be easy to find 20 more men who don’t have the treatment. Thus, it suffices to use the above as a statistical correction only. If we assume the same conditional probabilities w.r.t. the cause-effect relation in the subgroups, then the second table gives the counterfactual as if the subgroups had the same number of patients. There would be no occurrence of the Simpson paradox.

This counterfactual would also hold in cases when we cannot simply adjust the group sizes, like the classic case of admissions of students to Berkeley.

While the causality that the drug has no effect on gender is quite clear, the situation is less obvious w.r.t. the issue on blood pressure. In this case it might not be possible to get equal numbers in the subgroups. Not for ethical reasons but because people react differently on the treatment. This case would require a separate discussion, for the causality clearly is different.

Educational software on Simpson’s paradox

There are some sites for a first encounter with Simpson’s paradox.

A common plot is labelled Baker – Kramer 2001 but earlier were Jeon – Chung – Bae 1987. This plot keeps the number of men and women and the conditional probabilities the same, and allows only variation over the enrollments in the subgroups. This nicely shows the composition effect. The condition of equal percentages per subgroup works, but there are also other combinations that avoid Simpson’s paradox. But of course, Pearl is interested in causality, and not the mere statistical effect of composition.

The most insightful plot seems to be from vudlabIt has upward sloping lines rather than downward sloping ones, which somewhat seems easier to follow. There is a (seemingly) continuous slider, it rounds the person counts, and it has a graphic for the percentages that makes it easier to focus on those.

Kady Schneiter has various applets on statistics, of which this one on Simpson’s paradox. I agree with her discussion (Journal of Statistics Education 2013) that an example with pets (cats and dogs) lowers the barrier for understanding. Perhaps we should not use the size of the pet (small or large) but still gender. The plot uses downward sloping lines and has an unfortunate lag in the display of the light blue dot. (This might be dogs, but we can also compare with the Berkeley case in vudlab.)

The Wolfram Demonstrations by (1) Heiner & Wagon and (2) Brodie provide different formats that may come into use too. The advantage of the latter is that you can put in your own numbers.

This discussion by Andrew Gelman caused me to google on these displays.

Alexander Bogomolny has a fine vector display but there is no link to causality (yet).

Robert Banis has some data from the original Berkeley study, and excel sheets using them.

Some ten years ago there would have been more references to excel sheets indeed, with the need for students to do some editing themselves. The educational attention apparently shifts to applets with sliders. For those with still an interest in excel, the sheet with above tables is here: 2017-01-28-data-from-pearl-2000.

And of course there is wikipedia (a portal, no source). (Students from MIT are copying their textbooks into wikipedia, whence the portal becomes unreadable for the common reader. It definitely cannot be used as an educational source.)

Conclusion

This sets the stage for another kind of discussion in the next weblog entry.

Exponential functions have the form bx, where b > 0 is the base and x the exponent.

Exponential functions are easily introduced as growth processes. The comparison of x² and 2^x is an eye-opener, with the stories of duckweed or the grain on the chess board. The introduction of the exponential number e is a next step. What intuitions can we use for smooth didactics on e ?

The “discover-e” plot

There is the following “intuitive graph” for the exponential number e = 2,71828…. The line y = e is found by requiring that the inclines (tangents) to bx all run through the origin at {0, 0}. The (dashed) value at x = 1 helps to identify the function ex itself. (Check that the red curve indicates 2^x).

Functions 2^x, e^x and 4^x, and tangents through {0, 0}

2^x, e^x and 4^x, and inclines through {0, 0}

Remarkably, Michael Range (2016:xxix) also looks at such an outcome = 2^(1 / c), where is the derivative of = 2^x at x = 0, or c = ln[2]. NB. Instead of the opaque term “logarithm” let us use “recovered exponent”, denoted as rex[y].

Perhaps above plot captures a good intuition of the exponential number ? I am not convinced yet but find that it deserves a fair chance.

NB. Dutch mathematics didactician Hessel Pot, in an email to me of April 7 2013, suggested above plot. There appears to be a Wolfram Demonstrations Project item on this too. Their reference is to Helen Skala, “A discover-e,” The College Mathematics Journal, 28(2), 1997 pp. 128–129 (Jstor), and it has been included in the “Calculus Collection” (2010).

Deductions

The point-slope version of the incline (tangent) of function f[x] at x = a is:

y – f[a] = s (x a)

The function b^x has derivative rex[b] b^x. Thus at arbitrary a:

y – b^a = rex[b] b^a (x a)

This line runs through the origin {xy} = {0, 0} iff

0 – b^a = rex[b] b^a (0 – a)

1 = rex[ba

Thus with H = -1, a = rex[b]H = 1 / rex[b]. Then also:

yf[a] = b^a = b^rex[b]H = e^(rex[b]  rex[b]H) = e^1 = e

The inclines running through {0, 0} also run through {rex[b]H, e}. Alternatively put, inclines can thus run through the origin and then cut y = e .

For example, in above plot, with 2^x as the red curve, rex[2] ≈ 0.70 and ≈ 1.44, and there we find the intersection with the line y = e.

Subsequently also at a = 1, the point of tangency is {1, e}, and we find with e that rex[e] = 1,

The drawback of this exposition is that it presupposes some algebra on e and the recovered exponents. Without this deduction, it is not guaranteed that above plot is correct. It might be a delusion. Yet since the plot is correct, we may present it to students, and it generates a sense of wonder what this special number e is. Thus it still is possible to make the plot and then begin to develop the required math.

Another drawback of this plot is that it compares different exponential functions and doesn’t focus on the key property of e^x, namely that it is its own derivative. A comparison of different exponential functions is useful, yet for what purpose exactly ?

Descartes

Our recent weblog text discussed how Cartesius used Euclid’s criterion of tangency of circle and line to determine inclines to curves. The following plots use this idea for e^x at point x = a, for a = 0 and a = 1.

Incline to e^x at x = 0 (left) and x = 1 (right)

Incline to e^x at x = 0 (left) and x = 1 (right)

Let us now define the number e such that the derivative of e^x is given by e^x itself. At point x = a we have s = e^a. Using the point-slope equation for the incline:

y – f[a] = s (x a)

y – e^ae^a (x a)

y e^a (x – (a – 1))

Thus the inclines cut the horizontal axis at {x, y} = {a – 1, 0}, and the slope indeed is given by the tangent s = (f[a] – 0) / (a – (a – 1)) = f[a] / 1 = e^a.

The center {u, 0} and radius r of the circle can be found from the formulas of the mentioned weblog entry (or Pythagoras), and check e.g. a = 0:

u = a + s f[a] = a + (e^a

r = f[a] √ (1 + s²) = e^a √ (1 + (e^a)²)

A key problem with this approach is that the notion of “derivative” is not defined yet. We might plug in any number, say e^2 = 10 and e^3 = 11. For any location the Pythagorean Theorem allows us to create a circle. The notion of a circle is not essential here (yet). But it is nice to see how Cartesius might have done it, if he had had e = 2.71828….

Conquest of the Plane (COTP) (2011)

Conquest of the Plane (2011:167+), pdf online, has the following approach:

  • §12.1.1 has the intuition of the “fixed point” that the derivative of e^x is given by e^x itself. For didactics it is important to have this property firmly established in the minds of the students, since they tend to forget this. This might be achieved perhaps in other ways too, but COTP has opted for the notion of a fixed point. The discussion is “hand waiving” and not intended as a real development of fixed points or theory of function spaces.
  • §12.1.2 defines e with some key properties. It holds by definition that the derivative of e^x is given by e^x itself, but there are also some direct implications, like the slope of 1 at x = 0. Observe that COTP handles integral and derivative consistently as interdependent notions. (Shen & Lin (2014) use this approach too.)
  • §12.1.3 gives the existence proof. With the mentioned properties, such a number and function appears to exist. This compares e^x with other exponential functions b^x and the recovered exponents rex[y] – i.e. logarithm ln[y].
  • §12.1.4 uses the chain rule to find the derivatives of b^x in general. The plot suggested by Hessel Pot above would be a welcome addition to confirm this deduction and extension of the existence proof.
  • §12.1.5-7 have some relevant aspects that need not concern us here.
  • §12.1.8.1 shows that the definition is consistent with the earlier formal definition of a derivative. Application of that definition doesn’t generate an inconsistency. No limits are required.
  • §12.1.8.2 gives the numerical development of = 2.71828… There is a clear distinction between deduction that such a number exists and the calculation of its value. (The approach with limits might confuse these aspects.)
  • §12.1.8.3 shows that also the notion of the dynamic quotient (COTP p57)  is consistent with above approach to e. Thus, the above hasn’t used the dynamic quotient. Using it, we can derive that 1 = {(e^h – 1) // h, set h = 0}. Thus the latter expression cannot be simplified further but we don’t need to do so since we can determine that its value is 1. If we would wish so, we could use this (deduced) property to define e as well (“the formal approach”).

The key difference between COTP and above “approach of Cartesius” is that COTP shows how the (common) numerical development of e can be found. This method relies on the formula of the derivative, which Cartesius didn’t have (or didn’t want to adopt from Fermat).

Difference of COTP and a textbook introduction of e

In my email of March 27 2013 to Hessel Pot I explained how COTP differed from a particular Dutch textbook on the introduction of e.

  • The textbook suggests that f ‘[0] = 1 would be an intuitive criterion. This is only partly true.
  • It proceeds in reworking f ‘[0] = 1 into a more general formula. (I didn’t mention unstated assumptions in 2013.)
  • It eventually boils down to indeed positing that e^x has itself as its derivative, but this definition thus is not explicitly presented as a definition. The clarity of positing this is obscured by the path leading there. Thus, I feel that the approach in COTP is a small but actually key innovation to explicitly define e^x as being equal to its derivative.
  • It presents e only with three decimals.
Conclusion

There are more ways to address the intuition for the exponential number, like the growth process or the surface area under 1 / x. Yet the above approaches are more fitting for the algebraic approach. Of these, COTP has a development that is strong and appealing. The plots by Cartesius and Pot are useful and supportive but no alternatives.

The Appendix contains a deduction that was done in the course of writing this weblog entry. It seems useful to include it, but it is not key to above argument.

Appendix. Using the general formula on factor x a

The earlier weblog entry on Cartesius and Fermat used a circle and generated a “general formula” on a factor x a. This is not really factoring, since the factor only holds when the curve lies on a circle.

Using the two relations:

f[x] – f[a]  = (x a)  (2u – x – a) / (f[x] + f[a])    … (* general)

u = a + s f[a]       … (for a tangent to a circle)

we can restate the earlier theorem that s defined in this manner generates the slope that is tangent to a circle. 

f[x] – f[a]  = (x a)  (2 s f[a](x – a)) / (f[x] + f[a]) 

It will be useful to switch to x a = h:

f[a + h] – f[a]  = h (2 s f[a] – h) / (f[a + h] + f[a]) 

Thus with the definition of the derivative via the dynamic quotient we have:

df / dx = {Δf // Δx, set Δx = 0}

= {(f[a + h] – f[a]) // h, set h = 0}

= { (2 s f[a] – h) / (f[a + h] + f[a]), set h = 0}

= s

This merely shows that the dynamic quotient restates the earlier theorem on the tangency of a line and circle for a curve.

This holds for any function and thus also for the exponential function. Now we have s = e^a by definition. For e^x this gives:

ea + hea  = h (2 s eah) / (ea + h + ea)

For COTP §12.1.8.3 we get, with Δx = h:

df / dx = {Δf // Δx, set Δx = 0}

= {(ea + hea  ) // h, set h = 0}

= {(2 s eah) / (ea + h + ea) , set h = 0}

= s

This replaces Δf // Δx by the expression from the general formula, while the general formula was found by assuming a tangent circle, with s as the slope of the incline. There is the tricky aspect that we might choose any value of s as long as it satisfies u = a + s f[a]. However, we can refer to the earlier discussion in §12.1.8.2 on the actual calculation.

The basic conclusion is that this “general formula” enhances the consistency of §12.1.8.3. The deduction however is not needed, since we have §12.1.8.1, but it is useful to see that this new elaboration doesn’t generate an inconsistency. In a way this new elaboration is distractive, since the conclusion that 1 = {(e^h – 1) // h, set h = 0} is much stronger.