My book Elegance with Substance (EWS) (2009, 2015) (pdf online) has the theme that mathematics education requires fundamental re-engineering. Mathematicians are trained to think abstractly and are not trained for the empirical science of didactics. When they meet real life students in class, mathematicians suffer from cognitive dissonance, and resolve this by sticking to traditional ways of teaching, which tradition has not been designed for optimal didactics. In this manner, mathematics education is in shambles for some 5000 years.
One of the reasons why complex numbers are considered difficult – or difficult to teach – is that the traditional presentation of the square root is crummy. For this, see EWS p26 & p30.
- An author may presume that readers of some chapter in a book (EWS) have read the preceding chapters. In a weblog, such an assumption is rather risky.
- Please read EWS, for this already contains subtle discussion. When you have read EWS you better understand my intentions and what steps have been taken already. This will allow me to build upon what has been said before.
- In this weblog I am forced to repeat issues nevertheless, for I cannot assume that you have read EWS or earlier weblog entries.
- Let us make the best of this. For now, at least check out the earlier discussion of the quadratic function.
Metaphor
Math teacher John goes to school by vehicle, either car or bike. Since he uses his car more often he calls it “the” vehicle, and his bike “the other” vehicle. His class knows that he does so. One day, John goes to school by bike and hits a rock that bends the front wheel. Arriving late in class he explains: “Today I came by bike and had an accident. The vehicle was severely damaged and I had to walk.” Most students accept this explanation but some are inclined to mathematics or litigation, and they ask: “How can it be that you came by bike and that your car was severely damaged ?”
The idea that the word “the” could be reserved only for one vehicle conflicts with reality. The term “the” is used much more flexible in human discourse than one can really fix differently.
Correct in wikipedia
Wikipedia is not quite an encyclopedia but rather a portal to proper sources. In this case however it provides a proper definition of square root. I only edit the text to get the common {x, y} notation.
“(…) a square root of a number y is a number x such that x2 = y, in other words, a number x whose square (…) is y. (…) For example, 4 and −4 are square roots of 16 because 42 = (−4)2 = 16.” (wikipedia, edited)
Note two aspects:
- This is solving an equation. In Wolfram Alpha Solve[y = x^2, x].
- This can be done algebraically. If you want a numerical result, use N[Solve[y = x^2, x]]
Confusing in wikipedia
Subsequently, wikipedia correctly restates what mathematicians are doing. Both generate confusion.
- “Every non-negative real number y has a unique non-negative square root, called the principal square root, which is denoted by √y, where √ is called the radical sign or radix. For example, the principal square root of 9 is 3, denoted √9 = 3, because 32 = 3 × 3 = 9 and 3 is non-negative.”
- “Every positive number y has two square roots: √y, which is positive, and −√y, which is negative. Together, these two roots are denoted ± √y (see ± shorthand). Although the principal square root of a positive number is only one of its two square roots, the designation “the square root” is often used to refer to the principal square root.” (wikipedia, edited)
When teachers wonder why students have problems with square roots then teachers should look at themselves. It are they themselves who have introduced ambiguity by distinguishing the square root versus any square root. This distinction is okay for students who like litigation or mathematics, but it is too subtle for students who are still struggling to understand what the discussion is about.
Note two aspects:
- √y is a function from the set of nonnegative real numbers to itself.
- This can be in algebra while a numerical result is N[√y]. For example √2 ≈ 1.414…
Function or correspondence
Students in highschool learn only about functions. For each input in a function there is only one outcome. There are also correspondences such that an input can have more outcomes. Solving an equation x2 = y can have more outcomes and thus it is a correspondence. Since textbooks do not want to discuss correspondences, the distinction is hushed up.
It is better to be explicit about it. Thus, let the correspondence be defined as follows:
Do√y = ± √y since (± √y)2 = y . This merely rewrites Solve[y = x2, x].
The inverse of function f is function g such that subsequent application returns the original input, thus g[f[x]] = x. The inverse can be found by mirroring alongside the line y = x. The distinction between function and correspondence also explains why the inverse of y = x2 gives the correspondence y = Do√x = ± √x (check the switch in the labels of y and x).
y = x^2 and inverse correspondence y = DoSqrt[x]
Verb and noun
EWS links above notions to the linguistic distinction between verb and noun.
- The correspondence Do√y is like a verb and the function √y is like a noun.
- The algebraic solution √2 is a noun and using the square root sign as an instruction for the calculator is like a verb (outcome 1.414…).
- See Gray & Tall on the notion of the “procept” (process & concept) (verb and noun). The idea of the procept is that mathematics deliberately uses few symbols but with different meanings depending upon context. Teachers tend to treat those contexts implicitly, leaving students to guess what is happening. It is better to introduce more symbols that are explicit about the meaning, like the distinction between function √ and correspondence Do√.
Solution of the abuse of “the”
We diagnose that mathematicians introduce confusion themselves. In highschool they explain what a function is and what solving an equation is, but they do not dwell on the difference between function versus correspondence (and employ such for solving an equation). They encode the hidden intention by introducing an artificial distinction between “the” and “any” root. The latter doesn’t work because language doesn’t not co-operate. For some 3000 years students have been suffering because math teachers have been behaving like Humpty Dumpty in Alice in Wonderland, Through the Looking-Glass, holding that language must adapt to their needs rather than the reverse. Source: wikipedia:
“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”
The obvious solution is to ditch this artificial distinction between “the” and “any” root.
Instead of saying that √4 = 2 is “the root of 4“, it is better to say that it is “the root function value of 4“. Since square roots may also be taught in elementary school, another expression is “rootsign 4“.
Subsequent drama of complex numbers
Complex numbers are not difficult, hardly different from the system of co-ordinates itself, and they are actually quite enlightening since they allow working in the plane directly. Teaching the square root for real numbers however is bodged up in traditional didactics, and hence there is the subsequent drama for complex numbers. The square root of -1 becomes incomprehensible, and the train runs into a mountain.
Merely giving the following definitions is begging for problems.
- Let H = -1, see here. The negative numbers can be created by turning the positive numbers a half turn.
- Let H have two roots: i^2 = H and (-i)^2 = H.
- Let √ now be a function from all real numbers y to the complex numbers z = x + y’ i.
- Let i = √H.
Our earlier discussion of the complex numbers mentioned the danger. Consider:
-1 = H = i² = (√H) (√H) = √(H H) = √1 = 1 = –H (*)
Obviously, -1 = 1 doesn’t work for real numbers. Common discussions of this deduction are fraught with problems. They explain that it doesn’t work but not why, or give a curious reason why.
Developing better didactics
When students better understand square roots, then above definitions are easier to extend upon.
The reference to William Rowan Hamilton (1805-1865) and the unit circle is standard, but I have not yet seen the following explanation for above conundrum (*). Rowan Hamilton provided a proper development for complex numbers, namely with the model that z = {x, y} and i = {0, 1}. The crux lies in a particular interpretation of multiplication which neatly fits the theory of angles (trigonometry). A multiplication by i can be interpreted as a quarter turn.
Multiplication of points {a, b} = a + b i and {c, d} = c + d i, gives:
{a, b} {c, d} = (a + b i) (c + d i) = a c + b d i^2 + (a d + b c) i
- Choosing a = d = 1 and b = c = 0 gives {1, 0} {0, 1} = i, which is a quarter turn.
- Choosing a = c = 0 and b = d = 1 gives {0, 1} {0, 1} = i^2, which are two quarter turns = half turn. Thus i^2 = -1.
Hence the general formula becomes:
{a, b} {c, d} = (a + b i) (c + d i) = a c – b d + (a d + b c) i = {a c – b d, a d + b c}
- Starting from {1, 0} and turning counterclockwise along the unit circle in steps of i, we find the outcomes of i, H, -i, and return to 1.
- Clockwise in steps of –i, we find the outcomes of –i, (-i)² = H, (–i)³ = i and back to 1.
The conundrum (*) can now be explained:
- In terms of reals, H = –H only works when H = 0.
- When H means a half turn counterclockwise then –H means a half turn clockwise, and both generate the same result. Thus H = -H makes sense in terms of turning. (Creating the negative numbers from positive numbers x by using –x = H x can be done, but we may also allow for -x = –H x. To make this consistent, we require a more developed theory on rotations.)
Now that we understand why the H = –H outcome arises, we still find the situation paradoxical (without such a theory of rotations), and we decide to obliterate this outcome, so that we do not have to worry about it. This can be done with these definitions:
- When y < 0 then √y = i √ |y| where |y| gives the absolute value.
- When y < 0 and w < 0 then y w > 0 and √(y w) ≠ √y √w. PM. The latter is i√|y| i√|w| = – √ |y w|
- When y < 0 then Do√y = ± i √ |y|
Now deduction (*) is blocked: the step in the middle violates the rule for y < 0 and w < 0. The blockage is no mystery: we decided to choose it in this manner.
The rule for y < 0 and w < 0 causes the main conceptual issue for complex numbers. For example, what to do with √(-2) √(-3) ? Does it solve into √6 ? No, we have chosen to block this solution. The general rule now is to first substitute for i before doing something else. Then √(-2) √(-3) solves as i√2 i√3 = -√6. (There are various expositions online that show the proper calculation steps, e.g. this video. But showing how it is done is not explaining why it is done so.)
With this explanation of (1) what H = –H means and (2) that we block it deliberately, I wonder whether Rowan Hamilton only gave a different model, so that there is no true difference between x + i y and {x, y}. The models may be fully equivalent, and choosing i = {0, 1} only turns equivalence into identity.
Some PM points
PM 1. This discussion extends on the discussion of complex numbers in EWS p43. (That page uses {1, 2} + {3, 4} = {4, 6}, and the subsequent discussion has an awkward typing error.) A proof of concept is in Conquest of the Plane, p115.
PM 2. Observe that wikipedia has a long lemma on complex numbers and that the rule on y w only appears in notes to another lemma on square roots. Wikipedia is a portal and no textbook, but the rule still is an elementary part of the definition of i = √H. Perhaps the rule seems less relevant when you start from i = {0, 1} but you are bound to need it.
PM 3. A discussion here confuses √y with Do√y, and it argues that we have only:
- Do√H Do√H = (± i) (± i) = ± (-1) = ±1
- Do√(H H) = Do√1 = ±1.
There is this equality for Do√ indeed, but this doesn’t explain why it fails for √. The subsequent discussion at that website is a bit clearer: “Of course, you can choose to define the square root of a negative number to be a positive multiple of i; but if your book hasn’t done that, it doesn’t have a right to ask this question.” It is okay to establish the freedom for such a definition. The crux however is why one would make it. That website leaves this open. (Our own explanation above is that it avoids confusion on H = –H, which is true for rotations but not for real values.)
PM 4. A reader alerted me to it that the common discussion of the complex solution to the quadratic function uses a projection from the Argand plane onto the Cartesian plane. I was aware of this, but let me pass on the links, provided in this alert, to these two papers by Ansie Harding and Johann Engelbrecht: one and two.
PM 5. Dutch readers can benefit from a discussion by J.H. Wansink, “De complexe getallen. Een algebraisch onderwerp op dood spoor“, Euclides 51e jaargang no 4, 1975/1976, p127-149. Wansink is very positive about the way of presenting complex numbers as analytic geometry as had been done by H.J.E. Beth (father of E.W. Beth). I employed this same method in “Conquest of the Plane” (2011). A more traditional and in my view less transparant presentation as mostly algebra is by J. van de Craats, here, who also doesn’t fully explain (*), see question 1.7 on p4 and the answer on p71.
PM 6. Rotation in 2D can be modeled by a 2 x 2 matrix, see a bit of theory: step1 and step2. A complex number is equivalent to a matrix, and the multiplication of two complex numbers can be represented as matrix times vector. Hence there is a relation between the number -1 and the matrix H for a half turn.
Let Q be a quarter turn counterclockwise, then Q 1 = i. A half turn H = Q2 = – I, where I is the unit matrix. Then H 1 = -1. A clockwise turn is QT, the transpose of Q , and we find Q3 = QT. We can find H also as the square of this transpose.
The discussion above about H = –H thus can be resolved by the distinction between matrix H and number -1, and we find that actually H = HT. There remains a point of doubt though. There is an isomorphism between complex numbers and the matrix approach, and thus the explanation of rotation in the complex plane by the use of matrices is rather like begging the question.