Pierre van Hiele and Google Translate

Listening again to Girls of Ali Mountain


I had some fun today with Google Translate. For other people this is serious research and business, but a lay translator may be excused to play a bit. Unfortunately, play causes questions, it isn’t a free lunch.

Google Translate and the pronunciation of numbers

We discussed the pronunciation of numbers in English, German, French, Dutch and Danish before. Here is a suggestion to develop a standard.

Kids of age 4-6 live and think in spoken language before they learn reading and writing. Thus proper pronunciation of numbers will help them mastering the written number system and arithmetic. A first phase of reading is reading aloud, a later phase is subvocalisation (i.e. become silent), and perhaps later the latter may disappear. Thinking would still be much in “silent spoken language”, while only later the formulas like 1 + 1 = 2 would benefit from thinking in forms (symbol sense).

Ms. Sue Shellenbarger in the Wall St. Journal September 15 2014 discussed The Best Language for Math. Confusing English Number Words Are Linked to Weaker Skills”. 

Hence I wondered how Google Translate deals with this, with their pronunciation icon, and, whether they could support the development of such a standard.

  • When you type in 11, and ask for the pronunciation, then you get eleven.
  • When you type in ten one then you get ten one.
  • Ergo, it would be feasible to create a language tab English-M so that 11 gives pronunciation ten one. (And normal English again for not-numbers.)
Speech examples

When you type in 1111  then Google speech gives eleven eleven, which is wrong. Please do not alert them on this, because I want to keep the example intact. Only 1,111  generates spoken one thousand, one hundred and eleven, which it also should be for 1111. Except that English-M  would give thousand, one hundred, ten one.

Numbers also occur in full sentences. For example translate I will give you 11 dollars into Dutch. Again eleven and elf. Now suddenly 1111 is spoken correctly, perhaps because it are dollars ?

A switch between language and language-M

It might be a single option to select mathematical pronunciation, for all languages. But the tab would need to show English-M and Dutch-M to prevent confusion. Also, at one time, one might wish for a translation from English-M to traditional Dutch. Best could be a selector icon in the row of language tabs that allows you to switch between traditional and mathematical pronunciation.

Google Translate is already prim on the distinction between UK and US English. There is only one English tab, and the translation of say Dutch strengheid gives both rigor and rigour. But this is a spelling issue. Mathematical pronunciation of numbers isn’t spelling reform but an enrichment of language. And it is neither the difference between Oxford English and Cockney. There may be more sites explaining dialects than Oxford English.

Indeed, when we try to translate Me want money from English to English, to remove grammatical or spelling errors, with the options I want money or We want money, then Google Translate doesn’t allow this. It just doesn’t permit translation from English to English. The translation to Dutch selects the Me  I option. “Mij wil geld” is a literal translation but Google corrects into proper grammer “Ik wil geld”. One would however feel that crummy English should be translated as crummy Dutch.

A bit of greater fun is that Google Translate accepts spoken 1 plus 1 = 3, but refuses the input of 1 +1 = 2, perhaps because they think that + is no accepted sign in the English language, or perhaps because they think that it doesn’t need translation.

Language research

Google Translate acknowledges use of results by numerous scientists around the world. A key source is WordNet. (In Holland Piek Vossen is involved in this.) When you look at what they are doing, it is huge and impressive.

By comparison, the pronunciation of the numbers is trivial. Let us start with the 20% of effort that generates 80% of results. It is a suggestion for WordNet and Google Translate to look into this.

Thus the WordNet research group might consider supporting the development of this standard for the pronunciation. Developing the standard might take some time, given the need for consensus to develop. Likely there will be stages: first in education, then in law.

The resources and energy of Google Translate might also make a difference for practical developments, notably by providing example implementations. Formation of English-M need not wait for French-M.

Eventually, Google Translate may develop into Google Language, with checkers on spelling and grammar, thesaurus, rhyme, and what have you. Some users might want writing support, like a warning message that a text is too abstract and that an example is required.

It shouldn’t be too difficult either to make an app how to pronounce numbers in English-M, but this weblog isn’t about commerce.

Pierre van Hiele and the levels of insight

Pierre van Hiele presented a theory of levels of insight as a general theory for all epistemology. Geometry was where he started, and what he used as his key example case. Many people didn’t listen well and assumed that he thought that the levels apply only to geometry. See the error on wikipedia that I just linked to, or the misconception by David Tall, who thinks that he was the first one to discover the generality, but who at least supports the notion.

A consequence for language

A consequence of the theory of levels is that students speak different languages.

They use the same English words but mean something else. There will generally be great confusion in the classroom and lecture hall, except for the teacher, who can mediate between students at different levels of insight, including those who are making the shift.

Thus, depending upon the particular field F ∈ {mathematics, physics, biology, economics, …} Google Translate ought to have English-F-1, …, English-F-n. Mathematics would have the highest level because of the notion of formal proof. Perhaps that the majority of fields F might work with only three levels: novice, verbally fairly competent but reproductive, and reasoning informally.

These would also be the levels required for wikipedia-1, …, wikipedia-n. Wiki-articles on math topics are dominated by MIT students who copy their textbooks, which produces gibberish for novices, which isn’t quite the purpose of an encyclopedia. (And some students think they know it better anyway, see here.)

When Google Translate could translate English-M-2 to English-M-1 (as far as possible), then Google Translate would turn into a teacher’s assistent.

Language spaghetti

It may be that current translators, say from English to Spanish, might not be aware of the Van Hiele levels. The issue might not be quite urgent.

  • When translators focus on “words only” then they might translate English words into say Spanish words, and then let others deal with what those words mean to them.
  • Speakers of English-4 might use sentences that contain a few words that users of English-3 don’t use much – e.g. the very word “proof” – so that the translation from English-4 to Spanish-4 would tend to work.

Other cases might simply be spaghetti that perhaps might be neglected.

For example, users of English-2 could use terms from English-4, that they actually don’t understand. They may translate into Spanish-4 – e.g. “I got a proof” becomes “Tengo una prueba”. They wouldn’t understand either of those – since they don’t understand the notion of proof yet – so that this might not be a great loss.

It is a wary notion that Google Translate will perhaps be mostly busy in translating what people don’t understand anyway. Perhaps an exam needs be taken before you offer something to be translated. But we live in a fast world.

It remains valuable to be aware of levels

The upshot is that it would still be a valuable idea to identify Van Hiele levels. Words that seem the same have different meanings, because of those levels.

Wikipedia already uses the disambiguation. They seem to regard it as the minimal word that isn’t ambiguous itself, and take quite some space to explain it so that misunderstandings are excluded. I still wonder about the Van Hiele levels. A novice would only be aware that the same word has different uses (A. Einstein might also be Alfred Einstein), while a more experienced wiki disambiguator would see ripe fruits everywhere.

Google Translate already knows about different communities – say, bubble originates in the soap industry but is used metaphorically (a form of abstraction) in economics (stock market bubble). The word translates nicely into Spanish burbuja, and Google already indicates that also the Spanish speaking world would be aware of the notion of living in a bubble – check here. But perhaps we are missing some higher levels of abstraction here, like 1 bubble + 1 bubble can have all kinds of outcomes, sometimes 0, 1,2, 3, … bubbles. Not only in reality, but also in economics, and perhaps some topological models, or when a man in a bubble meets a woman in a bubble. For some a bubble is just a word, for others a world.


Your level of fun may increase by maintaining a lay level of insight.


Comments are closed.