Tag Archives: Continuity

I am looking for a story on continuity and limits that can be told in junior highschool and still makes sense. We would like an isomorphy between space and numbers. For some aspects, mathematical theory sends us to number theory, and for isomorph aspects, mathematical theory sends us to topology. It is awkward to have to translate similar notions, and to eliminate the overload of notions that are not directly relevant for this search for this junior highschool story.

For example, topology has rephrased results into statements on open and closed sets and boundaries, but I am wondering whether that is an effective manner of communication, when the relevant distinction is whether you are assuming a well-ordening or not. But I am not at home in number theory or topology. These comments on continuity and limits have been caused precisely because I am feeling the water.

Basically, I already designed such a story on continuity and limits (pdf, weblog), but now I am noticing that I can include a question mark on infinitesimals.

Addendum December 16

This weblog text is a rewrite of yesterday’s text.


The framework contains both the handling of real numbers on the calculator and a development of theory.

Example 1

Also in junior highschool, we want students to be aware that 0.999…. = 1.000…. so that these are the same number. You can see this by checking 3 * 1/3 = 1.

Example 2

When we approximate numbers with n the number of decimals, then these basically are like the natural numbers, and there remains a well ordering. Numbers are δ[n] = 10^(-n) apart.

When we shift to the use of the infinite number of decimals then we lose this “infinitesimal”. At issue is now whether the infinitesimal can be retained in some manner.

Standard definition of density causes contradiction

Discussing the continuum and the set of real numbers R recently, I suggested (here, property (a)) that R would be a dense set, according to the standard definition of density. This definition is that for any two elements x < y there would be at least one z between, as x < z < y. This would allow you to make cuts everywhere.

Oops. I retract.

Wikipedia (no source but a portal) has:

“From the ZFC axioms of set theory (including the axiom of choice) one can show that there is a well order of the reals.”

I don’t know quite what to think about this. Elsewhere I deduced that ZFC is inconsistent. But perhaps in a revised set theory, the well order can be retained.

We would like R to have a well-order for finite intervals too. Thus every number x has a next number x’. When you select y = x’  then you couldn’t find anything between x and y. This contradicts above statement on density.

Thus, the standard definition of density doesn’t fit a well-ordered R.

Designing a new definition for density of the reals R

We can design a new definition of density.

  • The standard definition is useful for the rationals Q. If we restrict your freedom to making cuts along Q, then we are safe again. In this manner, the distinction between rational and irrational numbers is useful to explain a property of R.
  • R is defined as “more dense” because Q is dense w.r.t. that original definition ((a)).
  • This proposal is quite similar to the Dedekind cut, with the distinction that we now allow that R might retain a well-ordering. That is, this issue on the ordering is no longer forced by the standard definition of density.
Surprise consequence as a bonus

Switching to another notion of density, generates the bonus that we have more scope to introduce the infinitesimal.

When every number x has a next x’, then we can define the infinitesimal as the difference:

δ = x’ x

It also means that an open set (a, b) can also be seen as a closed set [a + δ, b – δ].

Wikipedia (no source but a portal) claims today:

“The standard ordering ≤ of any real interval is not a well ordering, since, for example, the open interval (0, 1) ⊆ [0,1] does not contain a least element.”

Yet now we have (0, 1) = [δ, 1 – δ] and the least element is δ. Only the intervals with negative infinity might be excluded, check (-∞, ∞).

Properties of these infinitesimals

Some properties are:

(1) We still have 0.999…. = 1.000…..

(2) A current statement is that a line consists of points, and each point is a co-ordinate without length. We now can better express that length consists of a sum of short lengths. A sum of these infinitesimals Σδ makes sense if we regard it as the sum from x = 0 to x = 1 for x‘ – x. The trick is that the length is determined by the statement on x and not by the coefficient of δ.

(3) Using H = -1, then δ δH = 1. That is, δ ≠ 0, and thus there is no problem with division. The discussion about differentials is quite different from the discussion about these (new) infinitesimals. Much time has been spent in history in looking whether there might be a connection, but there isn’t.

Separate arithmetic for infinity and infinitesimal

Students already know that they cannot apply the rules of arithmetic to infinity. E.g. ∞ + ∞ = ∞. The same now holds for above hypothetical notion of the infinitesimal.

Property (2) carries over from δ[n] with n the number of decimals. Property (1) arises when n → ∞ . Potentially, these notions cannot be combined without some conflict.

We are accustomed to think that any real number can be divided. But e.g. δ / 2 is nonsense because it gives the distance between two numbers, and there is nothing smaller. Thus, the normal rules for arithmetic only hold for reals that are not these infinitesimals.

With δ = x’ x we also want to consider y = (x / n). When the numbers are halved for n = 2, is the distance halved or isn’t it halved ? In the approximation δ[n] the distance can become smaller when more digits are included. For an infinite number of digits, presumably, the distance cannot be halved. Thus δ = y’ y. Multiplication by n gives nδ = n (x / n)x. Thus x’ = n (x / n)‘ – nδ + δ for any n. This would make (most) sense by the choice δ = 0 and x’ / n = (x / n)‘. But then we are back in the classical approach again, without the well ordering. (The next number is the number itself.)

Persumably, we can argue that n * δ is as problematic as δ / n, though. The notion of Σδ namely has been solved by putting the consideration of length into the Σ sign.

I don’t know yet whether it is sufficient (consistent) to state δ = x’ and that the rules for arithmetic don’t apply to δ like they don’t apply to ∞. Potentially, we might write δH → ∞ (and this doesn’t mean that δ ∞ → 1).

All this depends upon whether we can develop a consistent set of definitions. Students at junior highschool might agree that they aren’t much interested in that.

Thus, it might only be in senior highschool, when we discuss the “classical” approach to the reals, that has (a, b) as an open interval only. We would be forced to this not because of the definition of density but because of the rules of arithmetic.


In itself, notions like these are not world shocking but they would tend to fit the intuitions of space and number for junior highschool.

At some point of history, the main stream in mathematics has opted for an approach to the reals so that they have no well ordering. The obstacle of the standard definition of density can be removed, as shown above. A problem still resides in arithmetic with δ = x’ x. It is not clear to me whether this can be resolved. It is not clear to me neither whether it is okay to have benign neglect till senior highschool, and face the consequences of losing the well ordering only there.


Isaac Newton (1642-1727) invented the differentials, calling them evanescent quantities. Since then, the world has been wondering what these are. Just to be sure, Newton wrote his Principia (1687) by using the methods of Euclidean geometry, so that his results could be accepted in the standard of his day (context of reconstruction and presentation), and so that his results were not lost in a discussion about the new method of these differentials (context of discovery). However, this only increased the enigma. What can these quantities be, that are so efficient for science, and that actually disappear when mathematically interesting ?

Gottfried Leibniz (1646-1716) gave these infinitesimals their common labels dy and dx, and thus they became familiar as household names in academic circles, but this didn’t reduce their mystery.

Charles Dodgson (1832-1898) as Lewis Carroll had great fun with the Cheshire Cat, who disappears but leaves its grin.

Abraham Robinson (1918-1974) presented an interpretation called “non-standard analysis“. Many people think that he clinched it, but when I start reading then my intuition warns me that this is making things more difficult. (Perhaps I should read more though.)

In 2007, I developed an algebraic approach to the derivative. This was in the book “A Logic of Exceptions” (ALOE), later also included in “Elegance with Substance” (EWS) (2009, 2015), and a bit later there was a “proof of concept” in “Conquest of the Plane” (COTP) (2011). The pdfs are online, and a recent overview article is here. A recent supplement is the discussion on continuity.

In this new algebraic approach there wasn’t a role for differentials, yet. The notation dy / dx = f ‘[x] for y f [x] can be used to link up to the literature, but up to now there was no meaning attached to the symbolism. In my perception this was (a bit of) a pity since the notation with differentials can be useful on occasion, see the example below.

Last month, reading Joop van Dormolen (1970) on the didactics of derivatives and the differential calculus – in a book for teachers Wansink (1970) volume III – I was struck by his admonition (p213) that dy / dx really is a quotient of two differentials, and that a teacher should avoid identifying it as a single symbol and as the definition of the derivative. However, when he proceeded, I was disappointed, since his treatment didn’t give the clarity that I looked for. In fact, his treatment is quite in line with that of Murray Spiegel (1962), “Advanced calculus (Metric edition)”, Schaum’s outline series, see below. (But Van Dormolen very usefully discusses the didactic questions, that Spiegel doesn’t look into.)

Thus, I developed an interpretation of my own. In my impression this finally gives the clarity that people have been looking for starting with Newton. At least: I am satisfied, and you may check whether you are too.

I don’t want to repeat myself too much, and thus I assume that you read up on the algebraic approach to the derivative in case of questions. (A good place to start is the recent overview.)

Ray through an origin

Let us first consider a ray through the origin, with horizontal axis x and vertical axis y. The ray makes an angle α with the horizontal axis. The ray can be represented by a function as y =  f [x] = s x, with the slope s = tan[α]. Observe that there is no constant term (c = 0).


The quotient y / x is defined everywhere, with the outcome s, except at the point x = 0, where we get an expression 0 / 0. This is quite curious. We tend to regard y / x as the slope (there is no constant term), and at x = 0 the line has that slope too, but we seem unable to say so.

There are at least three responses:

(i) Standard mathematics then takes off, with limits and continuity.

(ii) A quick fix might be to try to define a separate function to find the slope of a ray, but we can wonder whether this is all nice and proper, since we can only state the value s at 0 when we have solved the value elsewhere. If we substitute y when it isn’t a ray, or example x², then we get a curious construction, and thus the definition isn’t quite complete since there ought to be a test on being a ray.




(iii) The algebraic approach uses the following definition of the dynamic quotient:

y // x ≡ { y / x, unless x is a variable and then: assume x ≠ 0, simplify the expression y / x, declare the result valid also for the domain extension x = 0 }

Thus in this case we can use y // x = s x // x = s, and this slope also holds for the value x = 0, since this has now been included in the domain too.

In a nutshell for dy / dx

In a nutshell, we get the following situation for dy / dx:


Properties are exactly as Van Dormolen explained:

  • “dy” and “dx” are names for variables, and thus they have their own realm with their own axes.
  • The definition of their relationship is dy = f ‘[x] dx.

The news is:

  • The mistake in history was to write dy / dx instead of dy // dx.

The latter “mistake” can be understood, since the algebraic approach uses notions of set theory, domain and range, and dynamics as in computer algebra, and thus we can forgive Newton for not getting there yet.

To link up with history, we might define that the “symbol dy / dx as a whole” is a shortcut for dy // dx. This causes additional yards to develop the notion of “symbol as a whole” however. My impression is that it is better to use dy // dx unless it is so accepted that it might become pedantic. (You must only explain that the Earth isn’t flat while people don’t know that yet.)

Application to Spiegel 1962 gives clarity

Let us look at Spiegel (1962) p58-59, and see how above discussion can bring clarity. The key points can all be discussed with reference to his figure 4-1.


Looking at this with a critical eye, we find:

  • At the point P, there is actually the creation of two new sets of axes, namely, both the {Δx, Δy} plane and the {dx, dy} plane.
  • These two new planes have both rays through the origin, one with angle θ and one with angle α.
  • The two planes help to define the error. An error is commonly defined from the relation “true value = estimate + error”. The true value of the angle is θ and our estimate is α.
  • Thus we get absolute error Δf = s Δx + ε where s = dy / dx. This error is a function of Δx, or ε = ε[Δx]. It solves as ε = Δf – s Δx.
  • The relative error is Δf / Δx =  dy / dx + r which solves as r = Δf / Δx – dy / dx. This is still a function rx]. We use the quotient of the differentials instead of the true quotient of the differences.
  • We better re-consider the error in terms of the dynamic quotient, replacing / by // in the above, because at P we like the error to be zero. Thus in above figure we have ε = Δf  s Δx, where s = dy // dx.
  • A source of confusion is that Spiegel suggests that d≈ Δx or even dx = Δx but this is numerically true only sometimes and conceptually there surely is no identity since these are different axes.
  • In the algebraic approach, Δx is set to zero to create the derivative, in particular the value of f ‘[x] = tan[α] at point P.  In this situation, Δx = 0 thus clearly differs from the values of dx that are still available on dx ‘s whole own axis. This explains why the creation of the differentials is useful. For, while Δx is set to 0, then the differentials can take any value, including 0.

Just to be sure, the algebraic approach uses this definition:

f ’[x] = {Δf // Δx, then set Δx = 0}

Subsequently, we define dy = f ‘[x] dx, so that we can discuss the relative error r = Δf // Δx – dy // dx.

PM. Check COTP p224 for the discussion of (relative) error, with the same notation. This present discussion still replaces the statement on differentials in COTP p155, step number 10.

A subsequent point w.r.t. the standard approach

Our main point thus is that the mistake in history was to write dy / dx instead of dy // dx. There arises a subsequent point of didactics. When you have real variables and z, then these have their own axes, and you don’t put them on the same axis just because they are both reals.

See Appendix A for a quote from Spiegel (1962), and check that it is convoluted at times.

Appendix B contains a quote from p236 from Adams & Essex (2013). We can see the same confusions as in Spiegel (1962). It really is a standard approach, and convoluted.

The standard approach takes Δx = dx and joins the axis for the variable Δy with the axis for the variable dy, with the common idea of “a change from y“. The idea of this setup is that it shows the error for values of Δx = dx.


It remains an awkward setup. It may well be true that John from Los Angeles is called Harry in New York, but when John calls his mother back home and introduces himself as “Mom, this is Harry”, then she will be confused. Eventually she can get used to this kind of phonecalls, but it remains awkward didactics to introduce students to these new concepts in this manner. (Especially when John adds: “Mom, actually I don’t exist anymore because I have been set to zero.”)

Thus, in good didactics we should drop this Δx = dx.

Alternatively put: We might define dy = f ’[x] Δx = f // Δx, then set Δx = 0} Δx. In the latter expression Δx occurs twice: both as a local and bound variable within { … } and as a global free variable outside of { … }. This is okay. In the past, mathematicians apparently thought that it might make things clearer to write dfor the free global variable: dy = f ’[x] dx. In a way this is okay too. But for didactics it doesn’t work. We should rather avoid an expression in which the same variable (name) is uses both locally bound and globally free.

Clear improvement

Remarkably, we are using 99% of the same apparatus as the standard approach, but there are clear improvements:

  • There is no use of limits. All information is contained in the algebra of both the function f and the dynamic quotient. See here for continuity.
  • There is a clear distinction between the three realms {x, y}, {Δx, Δy} and {dx, dy}.
  • There is the new tool of the {dx, dy} space that can be used for analysis of variations.
  • Didactically, it is better to first define the derivative in chapter 1, and then introduce the differentials in chapter 2, since the differentials aren’t needed to understand chapter 1.
  • There is clarity about the error, that one doesn’t take d≈ Δx but considers ε = Δf  s Δx, where s has been found from the recipe s = f ’[x] = {Δf // Δx, then set Δx = 0}.
Example by Van Dormolen (1970:219)

This example assumes the total differential of the function f[x, y]:

df = (∂f // ∂x) dx + (∂f // ∂y) dy

Question. Give the slope of the tangent in the point {3, 4} of the circle x² + y²  = 25.

Answer. The point is on the circle indeed. We write the equation as f[x, y] = x² + y²  = 25. The total differential gives 2x dx + 2y dy = 0. Thus dy // dx = – x // y. Evaluation at the point {3, 4} gives the slope – 3/4.  □

PM. We might develop y algebraically as a function of and then use the +√ rather than the -√. However, more abstractly, we can use [x], and use dy = g ‘[x] dx, so that the slope of the tangent is g ‘[x] at the point {3, 4}. Subsequently we use g ‘[x] = dy // dx.

PM. In the Dutch highschool programme, partial derivatives aren’t included, but when we can save time by a clear presentation, then they surely should be introduced.


The conclusion is that the algebraic approach to the derivative also settles the age-old question about the meaning of the differentials.

For texts in the past the interpretation of the differential is a mess. For the future, textbooks now have the option of above clarity.

Again, a discussion about didactics is an inspiration for better mathematics. Perhaps research mathematicians have abandoned this topic for ages, and it is only looked at by researchers on didactics.

Appendix A. Spiegel (1962)

Quote from Murray Spiegel (1962), “Advanced calculus (Metric edition)”, Schaum’s outline series, p58-59.


Appendix B. Adams & Essex (2013)

The following quote is from Robert A. Adams & Christopher Essex (2013), “Calculus. A Complete Course”, Pearson, p236.

  • It is a pity that they use c as a value of x rather than as an universal name for a constant (value on the y axis).
  • For them, the differential cannot be zero, while Spiegel conversely states that it is “not necessarily zero”.
  • They clearly show that you can take f ‘[x] Δin in {Δx, Δy} space, and that you then need a new symbol for the outcome, since Δy already has been defined differently. However, it is awkward to say: “For such an approximation, the quantity Δx is traditionally denoted as dx (…)”. It may well be true that John from Los Angeles is called Harry in New York, … etcetera, see above.



The standard treatment of continuity in mathematics textbooks in schools tends to be a bit crooked.

  • The continuum is first assumed, but it is not stated what is assumed.
  • For the real line, the lack of holes is a key property of continuity, but it is called by a word that students might have no affinity with (“completeness” rather than “wholeness”).
  • When continuity is actually discussed in analysis (if at all), then this concerns the continuity of functions, which is rather a different subject.
  • A discussion of the continuum brings us to topology, but do we really need to start with topology before we can do analysis ? Do you want to start your junior highschool class by stating: “In the mathematical field of point-set topology, a continuum (plural: “continua”) is a nonempty compact connected metric space, or, less frequently, a compact connected Hausdorff space.” ?

Our research question for today is: What might be a more logical exposition ? (Didactics would be second phase.)

I will not be telling anything new here, but some students might benefit from the more explicit and straightforward discussion.

Continuity as a primitive notion that cannot be defined

The basic notion of continuity is the real line. One might also think about 3D space or time. L.E.J. Brouwer wouldn’t trust space (Euclidean or non-Euclidean ?) and take time as his intuition, and hence speak about “intuitionism”. My impression is that space is more easy to communicate about (measuring rods are easier to make than clocks), whence I adopt the real line.

Definition w.r.t. human experience: Continuity is a primitive notion, that you might grasp by considering a line (section) in space.

Definition for mathematics: The set of real numbers R can be defined in a particular way. Personally I prefer the method by Timothy Gowers to develop the real numbers as infinite decimals.

Once the real numbers have been defined, then we can say that they also satisfy the notion of continuity. Thus, continuity is either a human experience or defined as the real numbers.

Once we have done this, then we can find the “Cantor-Dedekind Axiom“:

“In mathematical logic, the phrase Cantor–Dedekind axiom has been used to describe the thesis that the real numbers are order-isomorphic to the linear continuum of geometry. In other words, the axiom states that there is a one-to-one correspondence between real numbers and points on a line.” (PM. This wikipedia page should link directly to Tarski’s axioms for geometry.)

I find the term “axiom” a bit problematic.

  • Given the two properties mentioned above, this isomorphism rather expresses human experience from modeling practice. The real numbers would be a model for the line (section) in actual space.
  • Likely though, the identification of R with space is best seen as a definition of what we regard as Euclidean space. It is a question whether actual space would be Euclidean. It is a question whether we can actually imagine space being non-Euclidean, since we imagine e.g. a sphere still in 3D Euclidean space.

But the isomorphism is explained rather easily – and for didactics we would likely begin with this. For the numbers we can look at a development of binary decimals between 0 and 1. The next decimal is 0 or 1, and again, and so on. For space we can make a cut and have left and right parts, make a cut again with new left and right parts, and so on. Thus this is the same structure. But these also are different realms: numbers and space. Thus it is not quite an identity but an isomorphism. Interestingly: when cutting in this manner, we will never meet a hole.

Continuity can be explained only for subsets

Subsequently, the key properties of continuity can be formulated w.r.t. subsets S of R, rather than w.r.t. R itself.

Definition: A subset S of R is called continuous, if between two elements (values) in S there is always another one (i.e. it is dense), and when there are no holes. Or in formulas:

(a) For each x, y in SR with x < y, there exists z in S such that x < z < y

(b) For each x, y in S R with x < y, there exists no z in R \ S such that x < z < y

The last property shows the difficulty for R. If one would want to specify that R has a hole, then one would have to specify what that hole belongs to. To some X ? What is X ? In the past, people had a problem imagining what a vacuum was: the horror vacui. For them, space could only exist if something occupied that space. Nowadays, mathematical space is understood merely as a set of co-ordinates, and the issue what physical space would be is left to physics.

Also observe that this definition essentially depends upon the fact that the real numbers have been given, i.e. the earlier section. Thus, continuity is a basic notion given for R and there is only a “proper” (explicit) definition for subsets: which definition relies on the use of R.

If you don’t assume R, you get into problems. For example, if you were to take the set of rational numbers Q rather than R, then (a-Q) could be satisfied for some S, say = [1/2, 3/2], and (b) would become:

(b-Q) For each x, y in S Q with x < y, there exists no z in Q \ S such that x < z < y

In that case, one might say that Q becomes Q-continuous, but this is not the continuity that we want, since there are elements in R still in that interval. (Contiguity comes to mind as a label, but already has some use.)

Further developments

Property (b) is called “(Dedekind) completeness“. It is true that “complete” is a proper translation of German “vollständig” (German wiki), but I would rather prefer “(Dedekind) wholeness”, since this better indicates the lack of holes. But let me admit that I am used to the phrase “completeness” as well, for the chapter of ordering, and thus my preference is weak. Perhaps it is best to speak about “completeness (wholeness)”.

Subsequently, when we forget about the reliance on R, and try for a more abstract formulation, then the notion of supremum (least upper bound) comes into play. We can look at some S independent from R, as the “linear continuum“. This is not intuitive and not feasible in junior highschool. Potentially this approach actually captures continuity in a definition, so that it isn’t just primitive, and can be defined, but (for me, yet) there is no clear connection between the notion of continuity and the property of having suprema. The switch to topology comes into play, see G.H. Moore “The emergence of open sets, closed sets, and limit points in analysis and topology” (and thanks to Dag Oskar Madsen for the reference, in a discussion about open sets that is closed now).

Continuous function

Obviously it helps to have clarity on continuity in the reals first before speaking about continuous functions.

The notion of a continuous function f uses both domain S and range f[S]. At stackexchange, many readers liked the view by Qiaochu Yuan (answer 17):

“One abstract way to think about continuity (…) is that it is about error. A function f:XY is continuous at x precisely when f(x) can be “effectively measured” in the sense that, by measuring x closely enough, we can measure f(x) to any desired precision. (…) This is an abstract formulation of one of the most basic assumptions of science: that (most of) the quantities we try to measure () depend continuously on the parameters of our experiments (…). If they didn’t, science would be effectively impossible.”

For Dutch readers, Vredenduin has a nice exposition in Euclides 1969 on the notion too, partly containing this intuition on the error too, but not so explicitly. He speaks about a small change in the domain and no dramatic change in the range, but it is more enlightening to explicitly speak about (measurement) error.  (And I would have a question on continuity of “f / g“, p14.)


The main argument is that this storyline is more straightforward for understanding continuity. All this suggests that school would benefit from a discussion of the reals. This would include issues like 0.9999…. = 1.0000…

I am supposing that junior highschool could manage the expressions of mathematical logic. The New Math tried and failed, but there should be more clarity why it failed.

PM. For completeness: there is always philosophy (and nonstandard analysis).

Horror vacui: no space without something (Wikemedia commons)

Horror vacui: no space without something (Wikimedia commons)