# Tag Archives: Calculus

Our protagonists are Cartesius (1596-1650) and Fermat (1607-1665). As Judith Grabiner states, in a recommendable text:

“One could claim that, just as the history of Western philosophy has been viewed as a series of footnotes to Plato, so the past 350 years of mathematics can be viewed as a series of footnotes to Descartes’ Geometry.”  (Grabiner) (But remember Michel Onfray‘s observation that followers of Plato have been destroying texts by opponents. (Dutch readers check here.))

Both Cartesius and Fermat were involved in the early development of calculus. Both worked on the algebraic approach without limits. Cartesius developed the method of normals and Fermat the method of adequality.

##### Fermat and Δf / Δx

Fermat’s method was algebraic itself, but later has been developed into the method of limits anyhow. When asked what the slope of a ray y = s x is at the point x = 0, then the answer y / x = s runs into problems, since we cannot use 0 / 0. The conventional answer is to use limits. This problem is more striking when one considers the special ray that is defined everywhere except at the origin itself. The crux of the problem lies in the notion of slope Δf / Δthat obviously has a problematic division. With set theory we can now define the “dynamic quotient”, so that we can use Δf // Δx = s even when Δx = 0, so that Fermat’s problem is resolved, and his algebraic approach can be maintained. This originated in 2007, see Conquest of the Plane (2011).

##### Cartesius and Euclid’s notion of tangency

Cartesius followed Euclid’s notion of tangency. Scholars tend to assign this notion to Cartesius as well, since he embedded the approach within his new idea of analytic geometry.

I thank Roy Smith for this eye-opening question:

“Who first defined a tangent to a circle as a line meeting it only once? From googling, it seems commonly believed that Euclid did this, but it seems nowhere in Euclid does he even state this property of a tangent line explicitly. Rather Euclid gives 4 other equivalent properties, that the line does not cross the circle, that it is perpendicular to the radius, that is a limit of secant lines, and that it makes an angle of zero with the circle, the first of which is his definition, the others being in Proposition III.16. I am wondering where the “meets only once” definition got started. I presume once it got going, and people stopped reading Euclid, (which seems to have occurred over 100 years ago), the currently popular definition took over. Perhaps I should consult Legendre or Hadamard? Thank you for any leads.” (Roy Smith, at StackExchange)

In this notion of tangency there is no problematic division, whence there is no urgency to use limits.

The reasoning is:

• (Circle & Line) A line is tangent to a circle when there is only one common point (or the two intersecting points overlap).
• (Circle & Curve) A smooth curve is tangent to a circle when the  two intersecting points overlap (but the curve might cross the circle at that point so that the notion of “two points” is even more abstract).
• (Curve & Line) A curve is tangent to a line when the above two properties hold (but the line might cross the curve, whence we better speak about incline rather than tangent).
##### Example of line and circle

Consider the line y f[x] = c + s x and the point {a, f[a]}. The line can also be written with c = f[a] – s a:

y – f[a] = s (x a)

The normal has slope –sHwhere we use = -1. The formula for the normal is the line y – f[a] = –sH  (xa). We can choose the center of the circle anywhere on this line. A handy choice is {u, 0}, so that we choose the center on the horizontal axis. (If we looked at a ray and point {0, 0}, then the issue would be similar for {0, c} for nonzero c and thus the approach remains general.) Substituting the point into the normal gives

0 – f[a] = –sH  (ua)

s = (u – a) / f[a]

u + s f[a]

The circle has the formula (x u)² + y² = r². Substituting {a, f[a]} generates the value for the radius r² = (a – (a + s f[a]))² + f[a]² = (1 + s²) f[a]² . The following diagram has {c, s, a} = {0, 2, 3} and thus u = 15 and r = 6√5.

##### Method of normals

For the method of normals and arbitrary function f[x], Cartesius’s trick is to substitute y = f[x] into the formula for the circle, and then solve for the unknown center of the circle.

(x u)² + (y – 0)² = r²

(x u)² + f[x]² – r² = 0         … (* circle)

This expression is only true for x = a, but we treat it as if it were more general. The key property is:

Since {a, f[a]} satisfies the circle, this equation has a solution for x = a with a double root.

Thus there might be some g such that the root can be isolated:

(x ag [x, u] = 0         … (* roots)

Thus, if we succeed in rewriting the formula for the circle into the form of the formula with the two roots, then we can use information about the structure of the latter to say something about u.

The method works for polynomials, that obviously have roots, but not necessarily for trigonometry and the exponential function.

##### Algorithm

The algorithm thus is: (1) Substitute f[x] in the formula for the circle. (2) Compare with the expression with the double root. (3) Derive u. (4) Then the line through {a, f[a]} and {u, 0} will give slope –sH. Thus s = (ua) / f[a] gives the slope of the incline (tangent) of the curve. (5) If f[a] = 0, add a constant or choose center {u, v}.

##### Application to the line itself

Consider the line y f[x] = c + s x again. Let us apply the algorithm. The formula for the circle gives:

(x u)² + (c + s x)² – r² = 0

x² – 2ux + u² + c² + 2csx + s²x² – r² = 0

(1 + s²) x² – 2 (u cs) x +  u² + c² – r² = 0

This is a polynomial. It suffices to choose g [x, u] = 1 + s²  so that the coefficients of are the same. Also the coefficient of must be the same. Thus expanding (xa)²:

(1 + s²) (x² – 2ax +  a²) = 0

– 2 (u cs)  = -2 a (1 +)

u = a (1 +) + cs = a + s (c + sa) = a + s f[a]

which is the same result as above.

##### A general formula with root x – a

We can deduce a general form that may be useful on occasion. When we substitute the point {af[a]} into the formula for the circle, then we can find r, and actually eliminate it.

(x u)² + f[x]² = r² = (a u)² + f[a

f[x f[a = (a u)² – (x u

(f[x] f[a](f[x] + f[a])  = ((a u) – (x u))  ((a u) + (x u))

(f[x] f[a](f[x] + f[a]) = (a x)   (a + x 2u)

f[x] f[a]  = (a x)  (a + x 2u) / (f[x] + f[a])

f[x] f[a]  = (x a)  (2u – x – a) / (f[x] + f[a])       … (* general)

f[x] f[a]  = (x a) q[x, a, u]

We cannot do much with this, since this is basically only true for x = a and f[x] – f[a] = 0. Yet we have this “branch cut”:

(1)      q[x, a, u] = f[x] – f[a]  / (a x)        if x ≠ a

(2)      q[a, a, u]      potentially found by other means

If it is possible to “simplify” (1) into another expression Simplify[q[x, a, u]] without the division, then the tantalising question becomes whether we can “simply” substitute x = a. Or, if we were to find q[a, a, u] via other means in (2), whether it links up with (1). These are questions of continuity, and those are traditionally studied by means of limits.

##### Theorem on the slope

We can still use the general formula to state a theorem.

Theorem. If we can eliminate factors without division, then there is an expression q[x, a, u] such that evaluation at x = a gives the slope s of the line, or q[a, a, u] = s, such that at this point both curve and line are touching the same circle.

Proof. Eliminating factors without division in above general formula gives:

q[x, a, u] (2u – x – a) / (f[x] + f[a])

Setting x = a gives:

q[a, a, u] = (u – a) / f[a]

And the above s = (u – a) / f[a] implies that q[a, a, u] = s. QED

This theorem gives us the general form of the incline (tangent).

y[x, a, u] = (x – a) q[a, a, u] + f[a]       …  (* incline)

y[x, a, u] = (x – a) (u – a) / f[a] + f[a

PM. Dynamic division satisfies the condition “without division” in the theorem. For, the term “division” in the theorem concerns the standard notion of static division.

##### Corollary. Polynomials as the showcase

Polynomials are the showcase. For polynomials p[x], there is the polynomial remainder theorem:

When a polynomial p[x] is divided by (x a) then the remainder is p[a].
(Also, x – a is called a “divisor” of the polynomial if and only if p[a] = 0.)

Using this property we now have a dedicated proof for the particular case of polynomials.

Corollary. For polynomials q[a] = s, with no need for u.

Proof. Now, p[x] – p[a] = 0 implies that – is a root, and then there is a “quotient” polynomial q[x] such that:

p[x] – p[a] = (x a) q[x]

From the general theorem we also have:

p[x] – p[a]  = (x a) q[x, a, u]

Eliminating the common factor (x – a) without division and then setting x = a gives q[a] = q[a, a, u] = s. QED

We now have a sound explanation why this polynomial property gives us the slope of the polynomial at that point. The slope is given by the incline (tangent), and it must also be slope of the polynomial because of the mutual touching of the same circle.

See the earlier discussion about techniques to eliminate factors of polynomials without division. We have seen a new technique here: comparing the coefficients of factors.

##### Second corollary

Since q[x] is a polynomial too, we can apply the polynomial remainder theorem again, and thus we have q[x] = (x a) w[x] + q[a] for some w[x]. Thus we can write:

p[x] = (x a) q[x] + p[a

p[x] = (x a) ( (x – a) w[x] + q[a] ) + p[a]       … (* Ruffini’s Rule twice)

p[x] = (x a w[x] + (x – a) q[a] + p[a]           … (* Range’s proof)

p[x] = (x a w[x] + y[x, a]                             … (* with incline)

We see two properties:

• The repeated application of Ruffini’s Rule uses the indicated relation to find both s = q[a] and constant f[a], as we have seen in last discussion.
• Evaluating f[x] / (x a)² gives the remainder y[x, a], which is the formula for the incline.
##### Range’s proof method

Michael Range proves q[a] = s as follows (in this article (p406) or book (p32)). Take above (*) and determine the error by substracting the line y = s (x a) + p[a] :

error = p[x] – y = (x a w[x] + (x – a) q[a] – s (x a)

= (x a w[x] + (x – a) (q[a] – s)

The error = 0 has a root x = a with multiplicity greater than one if and only if s = q[a].

##### Direct application to the incline itself

Now that we have established this theory, there may be no need to refer to the circle explicitly. It can suffice to use the property of the double root. Michael Range (2014) gives the example of the incline (tangent) at x² at {a, a²}. The formula for the incline is:

f[x] – f[a]  = s (x – a)

x² a² – s (x – a) = 0

(x – a) (x + a s) = 0

There is only a double root or (xa)² when s = 2a.

Working directly on the line allows us to focus on s, and we don’t need to determine q[x] and plug in x = a.

Michael Range (2011) clarifies – with thanks to a referee – that the “point-slope” form of a line was introduced by Gaspard Monge (1746-1818), and that Descartes apparently did not think about this himself and thus neither to plug in y = f [x] here. However, observe that we only can maintain that there must be a double root on this line form too, since {a, f[a]} still lies on a tangent circle.

[Addendum 2017-01-10: The later argument in a subsequent weblog entry becomes: If the function can be factored twice, then there is no need to refer to the circle. But when this would be equivalent to the circle then such a distinction is immaterial.]

##### Addendum. Example of function crossing a circle

When a circle touches a curve, it still remains possible that the curve crosses the circle. The original idea of two points merging together into an overlapping point then doesn’t apply anymore, since there is only one intersecting point on either side if the circle were smaller or bigger.

An example is the spline function g[x] = {If x < 0 then 4 – x² / 4 else 4 + x² / 4}. This function is C1 continuous at 0, meaning that the sections meet and that the slopes of the two sections are equal at 0, while the second and higher derivatives differ. The circle with center at {0, 0} and radius 4 still fits the point {0, 4}, and the incline is the line y = 4.

An application of above algorithm would look at the sections separately and paste the results together. Thus this might not be the most useful example of crossing.

In this example there might be no clear two “overlapping” points. However, observe:

• Lines through {0, 4} might have three points with the curve, so that the incline might be seen as having three overlapping points.
• Points on the circle can always be seen as the double root solutions for tangency at that point.

There is still quite a conceptual distance between (i) the story about the two overlapping points on the circle and (ii) the condition of double roots in the error between line and polynomial.

The proof given by Range uses the double root to infer the slope of the incline. This is mathematically fine, but this deduction doesn’t contain a direct concept that identifies q[a] as the slope of an incline (tangent): it might be any line.

We see this distinction between concept and algorithm also in the direct application to Monge’s point-slope formulation of the line. Requiring a double root works, but we can only do so because we know about the theory about the tangent circle.

The combination of circle and line remains the fundamental reason why there are two roots. Thus the more general proof given above, that reasons from the circle and unpacks f[x]² – f[a]² into the conditions for incline and its normal, is conceptually more attractive. I am new to this topic and don’t know whether there are references for this general proof.

##### Conclusions

(1) We now understand where the double root comes from. See the earlier discussion on polynomials, Ruffini’s rule and the meaning of division (see the section on “method 2”).

(2) There, we referred to polynomial division, with the comment: “Remarkably, the method presumes x ≠ a, and still derives q[a]. I cannot avoid the impression that this method still has a conceptual hole.” However, we now observe that we can compare the values of the coefficients of the powers of x, whence we can avoid also polynomial division.

(3) There, we had a problem that developing p[x] = (x aw[x] + y[x, a] didn’t have a notion of tangency, in terms of Δf / Δx. However, we actually have a much older definition of tangency.

(4) The above states an algorithm and a general theorem with the requirements that must be satisfied.

(5) Cartesius wins from Fermat on this issue of the incline (tangent), and actually also on providing an exact method for polynomials, where Fermat introduced the problem of error.

(6) For trigonometry and exponentials we know that these can be written as power series, and thus the Cartesian method would also apply. However, the power series are based upon derivatives, and this would introduce circularity. However, the method of the dynamic quotient from 2007 still allows an algebraic result. The further development from Fermat into the approach with limits would become relevant for more complex functions.

PM. The earlier discussion referred to Peter Harremoës (2016) and John Suzuki (2005) on this approach. New to me (and the book unread) are: Michael Range (2011), the recommendable Notices, or the book (2015) – review Ruane (2016) – and Shen & Lin (2014).

Cartesius, Portrait by Frans Hals 1648

Isaac Newton (1642-1727) invented the differentials, calling them evanescent quantities. Since then, the world has been wondering what these are. Just to be sure, Newton wrote his Principia (1687) by using the methods of Euclidean geometry, so that his results could be accepted in the standard of his day (context of reconstruction and presentation), and so that his results were not lost in a discussion about the new method of these differentials (context of discovery). However, this only increased the enigma. What can these quantities be, that are so efficient for science, and that actually disappear when mathematically interesting ?

Gottfried Leibniz (1646-1716) gave these infinitesimals their common labels dy and dx, and thus they became familiar as household names in academic circles, but this didn’t reduce their mystery.

Charles Dodgson (1832-1898) as Lewis Carroll had great fun with the Cheshire Cat, who disappears but leaves its grin.

Abraham Robinson (1918-1974) presented an interpretation called “non-standard analysis“. Many people think that he clinched it, but when I start reading then my intuition warns me that this is making things more difficult. (Perhaps I should read more though.)

In 2007, I developed an algebraic approach to the derivative. This was in the book “A Logic of Exceptions” (ALOE), later also included in “Elegance with Substance” (EWS) (2009, 2015), and a bit later there was a “proof of concept” in “Conquest of the Plane” (COTP) (2011). The pdfs are online, and a recent overview article is here. A recent supplement is the discussion on continuity.

In this new algebraic approach there wasn’t a role for differentials, yet. The notation dy / dx = f ‘[x] for y f [x] can be used to link up to the literature, but up to now there was no meaning attached to the symbolism. In my perception this was (a bit of) a pity since the notation with differentials can be useful on occasion, see the example below.

Last month, reading Joop van Dormolen (1970) on the didactics of derivatives and the differential calculus – in a book for teachers Wansink (1970) volume III – I was struck by his admonition (p213) that dy / dx really is a quotient of two differentials, and that a teacher should avoid identifying it as a single symbol and as the definition of the derivative. However, when he proceeded, I was disappointed, since his treatment didn’t give the clarity that I looked for. In fact, his treatment is quite in line with that of Murray Spiegel (1962), “Advanced calculus (Metric edition)”, Schaum’s outline series, see below. (But Van Dormolen very usefully discusses the didactic questions, that Spiegel doesn’t look into.)

Thus, I developed an interpretation of my own. In my impression this finally gives the clarity that people have been looking for starting with Newton. At least: I am satisfied, and you may check whether you are too.

I don’t want to repeat myself too much, and thus I assume that you read up on the algebraic approach to the derivative in case of questions. (A good place to start is the recent overview.)

##### Ray through an origin

Let us first consider a ray through the origin, with horizontal axis x and vertical axis y. The ray makes an angle α with the horizontal axis. The ray can be represented by a function as y =  f [x] = s x, with the slope s = tan[α]. Observe that there is no constant term (c = 0).

The quotient y / x is defined everywhere, with the outcome s, except at the point x = 0, where we get an expression 0 / 0. This is quite curious. We tend to regard y / x as the slope (there is no constant term), and at x = 0 the line has that slope too, but we seem unable to say so.

There are at least three responses:

(i) Standard mathematics then takes off, with limits and continuity.

(ii) A quick fix might be to try to define a separate function to find the slope of a ray, but we can wonder whether this is all nice and proper, since we can only state the value s at 0 when we have solved the value elsewhere. If we substitute y when it isn’t a ray, or example x², then we get a curious construction, and thus the definition isn’t quite complete since there ought to be a test on being a ray.

(iii) The algebraic approach uses the following definition of the dynamic quotient:

y // x ≡ { y / x, unless x is a variable and then: assume x ≠ 0, simplify the expression y / x, declare the result valid also for the domain extension x = 0 }

Thus in this case we can use y // x = s x // x = s, and this slope also holds for the value x = 0, since this has now been included in the domain too.

##### In a nutshell for dy / dx

In a nutshell, we get the following situation for dy / dx:

Properties are exactly as Van Dormolen explained:

• “dy” and “dx” are names for variables, and thus they have their own realm with their own axes.
• The definition of their relationship is dy = f ‘[x] dx.

The news is:

• The mistake in history was to write dy / dx instead of dy // dx.

The latter “mistake” can be understood, since the algebraic approach uses notions of set theory, domain and range, and dynamics as in computer algebra, and thus we can forgive Newton for not getting there yet.

To link up with history, we might define that the “symbol dy / dx as a whole” is a shortcut for dy // dx. This causes additional yards to develop the notion of “symbol as a whole” however. My impression is that it is better to use dy // dx unless it is so accepted that it might become pedantic. (You must only explain that the Earth isn’t flat while people don’t know that yet.)

##### Application to Spiegel 1962 gives clarity

Let us look at Spiegel (1962) p58-59, and see how above discussion can bring clarity. The key points can all be discussed with reference to his figure 4-1.

Looking at this with a critical eye, we find:

• At the point P, there is actually the creation of two new sets of axes, namely, both the {Δx, Δy} plane and the {dx, dy} plane.
• These two new planes have both rays through the origin, one with angle θ and one with angle α.
• The two planes help to define the error. An error is commonly defined from the relation “true value = estimate + error”. The true value of the angle is θ and our estimate is α.
• Thus we get absolute error Δf = s Δx + ε where s = dy / dx. This error is a function of Δx, or ε = ε[Δx]. It solves as ε = Δf – s Δx.
• The relative error is Δf / Δx =  dy / dx + r which solves as r = Δf / Δx – dy / dx. This is still a function rx]. We use the quotient of the differentials instead of the true quotient of the differences.
• We better re-consider the error in terms of the dynamic quotient, replacing / by // in the above, because at P we like the error to be zero. Thus in above figure we have ε = Δf  s Δx, where s = dy // dx.
• A source of confusion is that Spiegel suggests that d≈ Δx or even dx = Δx but this is numerically true only sometimes and conceptually there surely is no identity since these are different axes.
• In the algebraic approach, Δx is set to zero to create the derivative, in particular the value of f ‘[x] = tan[α] at point P.  In this situation, Δx = 0 thus clearly differs from the values of dx that are still available on dx ‘s whole own axis. This explains why the creation of the differentials is useful. For, while Δx is set to 0, then the differentials can take any value, including 0.

Just to be sure, the algebraic approach uses this definition:

f ’[x] = {Δf // Δx, then set Δx = 0}

Subsequently, we define dy = f ‘[x] dx, so that we can discuss the relative error r = Δf // Δx – dy // dx.

PM. Check COTP p224 for the discussion of (relative) error, with the same notation. This present discussion still replaces the statement on differentials in COTP p155, step number 10.

##### A subsequent point w.r.t. the standard approach

Our main point thus is that the mistake in history was to write dy / dx instead of dy // dx. There arises a subsequent point of didactics. When you have real variables and z, then these have their own axes, and you don’t put them on the same axis just because they are both reals.

See Appendix A for a quote from Spiegel (1962), and check that it is convoluted at times.

Appendix B contains a quote from p236 from Adams & Essex (2013). We can see the same confusions as in Spiegel (1962). It really is a standard approach, and convoluted.

The standard approach takes Δx = dx and joins the axis for the variable Δy with the axis for the variable dy, with the common idea of “a change from y“. The idea of this setup is that it shows the error for values of Δx = dx.

It remains an awkward setup. It may well be true that John from Los Angeles is called Harry in New York, but when John calls his mother back home and introduces himself as “Mom, this is Harry”, then she will be confused. Eventually she can get used to this kind of phonecalls, but it remains awkward didactics to introduce students to these new concepts in this manner. (Especially when John adds: “Mom, actually I don’t exist anymore because I have been set to zero.”)

Thus, in good didactics we should drop this Δx = dx.

Alternatively put: We might define dy = f ’[x] Δx = f // Δx, then set Δx = 0} Δx. In the latter expression Δx occurs twice: both as a local and bound variable within { … } and as a global free variable outside of { … }. This is okay. In the past, mathematicians apparently thought that it might make things clearer to write dfor the free global variable: dy = f ’[x] dx. In a way this is okay too. But for didactics it doesn’t work. We should rather avoid an expression in which the same variable (name) is uses both locally bound and globally free.

##### Clear improvement

Remarkably, we are using 99% of the same apparatus as the standard approach, but there are clear improvements:

• There is no use of limits. All information is contained in the algebra of both the function f and the dynamic quotient. See here for continuity.
• There is a clear distinction between the three realms {x, y}, {Δx, Δy} and {dx, dy}.
• There is the new tool of the {dx, dy} space that can be used for analysis of variations.
• Didactically, it is better to first define the derivative in chapter 1, and then introduce the differentials in chapter 2, since the differentials aren’t needed to understand chapter 1.
• There is clarity about the error, that one doesn’t take d≈ Δx but considers ε = Δf  s Δx, where s has been found from the recipe s = f ’[x] = {Δf // Δx, then set Δx = 0}.
##### Example by Van Dormolen (1970:219)

This example assumes the total differential of the function f[x, y]:

df = (∂f // ∂x) dx + (∂f // ∂y) dy

Question. Give the slope of the tangent in the point {3, 4} of the circle x² + y²  = 25.

Answer. The point is on the circle indeed. We write the equation as f[x, y] = x² + y²  = 25. The total differential gives 2x dx + 2y dy = 0. Thus dy // dx = – x // y. Evaluation at the point {3, 4} gives the slope – 3/4.  □

PM. We might develop y algebraically as a function of and then use the +√ rather than the -√. However, more abstractly, we can use [x], and use dy = g ‘[x] dx, so that the slope of the tangent is g ‘[x] at the point {3, 4}. Subsequently we use g ‘[x] = dy // dx.

PM. In the Dutch highschool programme, partial derivatives aren’t included, but when we can save time by a clear presentation, then they surely should be introduced.

##### Conclusion

The conclusion is that the algebraic approach to the derivative also settles the age-old question about the meaning of the differentials.

For texts in the past the interpretation of the differential is a mess. For the future, textbooks now have the option of above clarity.

Again, a discussion about didactics is an inspiration for better mathematics. Perhaps research mathematicians have abandoned this topic for ages, and it is only looked at by researchers on didactics.

##### Appendix A. Spiegel (1962)

Quote from Murray Spiegel (1962), “Advanced calculus (Metric edition)”, Schaum’s outline series, p58-59.

##### Appendix B. Adams & Essex (2013)

The following quote is from Robert A. Adams & Christopher Essex (2013), “Calculus. A Complete Course”, Pearson, p236.

• It is a pity that they use c as a value of x rather than as an universal name for a constant (value on the y axis).
• For them, the differential cannot be zero, while Spiegel conversely states that it is “not necessarily zero”.
• They clearly show that you can take f ‘[x] Δin in {Δx, Δy} space, and that you then need a new symbol for the outcome, since Δy already has been defined differently. However, it is awkward to say: “For such an approximation, the quantity Δx is traditionally denoted as dx (…)”. It may well be true that John from Los Angeles is called Harry in New York, … etcetera, see above.