# Success rate trade-off, ctnd

The former weblog entry discussed how a school can “improve” its success rate by ditching weak students. I used a small theoretical model to show this. It is more advantageous for a weak student to try for graduation twice, and by using a corrected success rate the school is not punished for that. Let us now look at some real data from a real school.

A disclaimer is that I am not at home in this field of study. My interest has been in the didactics of mathematics and overall economics of education, and while I have looked at issues of testing, this present application is a new topic for me. The Dutch Inspectorate of Education reports on this since 1817, and I am merely feeling the water and asking questions for my understanding.

Jan Jimkes has been critical of the government policy on arithmetic tests in highschool, see here and here. So let us see – tongue in cheek – how his own school is doing. Jimkes got his mathematics degree in 1966 and has been a math teacher for 36 years and former conrector of St. Bonifatiuscollege in Utrecht. Arithmetic indicates that he must have retired around 2002. Our data are from 2013-2014 and thus have not been affected by Jimkes directly. This discussion might not be impartial because I disagree with Jimkes on some points (see here), and because this is also my own highschool where I graduated in gymnasium in 1972. We might have arrived at Boni around the same time, but perhaps at the different buildings for I have no recollection of him back then, and my math teachers were Van Gils and Andringa.

The following discussion is not about mathematics education but about equality of opportunities in general. In Holland VWO = 6 years preparation for university, HAVO = 5 years preparation for college, VMBO = preparation for trade school. Within these curricula, there again is the distinction between the humanities (A, qualitative and likely not quantitative) and science (B, quantitative and likely also qualitative).

In elementary school at the end of grade 6 for pupils of age 12, teachers advised whether they might do VMBO, HAVO or VWO. At Boni grade 9, students have been allocated to HAVO and VWO, and we can see how the prediction worked out. Boni might get good graduation scores on VWO by sending weaker VWO students to HAVO.

The history of Boni is that it originated in the emancipation of Catholics in a Calvinist country. Originally Boni wanted to make sure that students capable of university got a real good VWO education (HBS, atheneum or gymnasium). The addition of HAVO is an outgrowth and originally no core business. The option of HAVO is agreeable for students who don’t fit VWO but who can remain in the same school. A student may feel better with high grades in HAVO than with low grades in VWO. Having graduated at HAVO at Boni, 17% continues in VWO again.

The data basically come from the school itself. Boni reports to DUO, and there is a visit by the Inspectorate of Education. The results are reported on by the Inspectorate and on the website “Scholen op de kaart” (SodK) where schools compare with each other. For Boni the link is here. I assume that these data will be updated to a new school-year, and thus I copy a graph below.

##### Report by the Inspectorate of Education

The Inspectorate of Education gives us the Boni report of 2015 about school-year 2013-2014. On page 9 of the pdf we find the following text. This text refers to the “scorecard 2014” while we will look at the scorecard 2015 with data on 2013-2014:

“The success rate for grades 7-9 is satisfactory for the scorecard of 2014 [sic]. However it is unsatisfactory for the years before. **Relatively many students with an advice for VWO transfer to HAVO**. The success rate for grades 10-12 is good.” (My emphasis and translation (English teacher Boerlage) of: “Het onderbouwrendement is volgens de opbrengstenkaart 2014 [sic] voldoende maar in de jaren daarvoor onvoldoende. Relatief veel leerlingen met een vwo-advies stromen af naar de havo. Het bovenbouwrendement is goed.”)

In the scorecard 2015, Boni had 1459 students in 2013-2014, 33% in the first two formative years (grades 7-8), 20% in HAVO (grades 9-11), 47% in VWO (grades 9-12).

We indeed find that an *initially surprising* percentage of potential VWO students are actually at HAVO. See also the graphs below.

- In grade 9 at VWO: 71% had an original VWO advice from elementary school, 27% had HAVO advice and 2% mixed.
- In grade 9 at HAVO: 52% had HAVO advice,
**47% VWO advice**and 2% had a mixed advice.

##### Report by SodK, schools comparing with each other

SodK gives graphs of above data. HAVO is on the left, and VWO is on the right.

- The red bar gives students with an original advice for VWO. Many are at HAVO indeed. However, HAVO is a smaller fraction of the school, so there is also the effect of the denominator. The “comparison” is awkward.
- The purple bar gives students with an original advice for HAVO. Surprising for me, still about a quarter of VWO classes are filled with these students. Apparently, prediction at the end of elementary school is difficult.

##### Comparison and translating the graphs into numbers

The grey bar is a “comparable group”, not necessarily the national average. This “comparison” however is distorted by the mixed HAVO / VWO advice, that is important for the “comparable group” but not relevant for Boni.

For VWO, the 2% mixed advice for Boni can be allocated equally to 27+1 = 28% HAVO advice and 71+1 = 72% VWO advice.

For VWO, the mixed advice in the “comparable group” is about 18%. Allocating this equally, we find a HAVO advice of about 16+9 = 27% and a VWO advice of about 64+9 = 73%.

Thus Boni is not exceptional.

VWO |
Boni |
“Comparable” |

VWO Advice |
72 | 73 |

HAVO Advice |
28 | 27 |

For HAVO, the 2% mixed advice for Boni creates a choice, and let us assume that the Boni split is 53% versus 47%.

For HAVO, the “comparable group” is about 22%. Let 55+11 = 66% have had HAVO advice, and 14+11 = 25% have had VWO advice. Then we still lack 9%. Probably this is VMBO advice, not shown here. The graph is crooked, and creates some uncertainty.

Boni may seem exceptional but when a normal outflow from its large VWO intake enters a smaller HAVO department, then this share must be higher. My suggestion is that the Inspectorate develops a better comparison for the effect of the denominator (I don’t feel like trying).

HAVO |
Boni |
“Comparable” |

VWO Advice |
47 | 25 |

HAVO Advice |
53 | 66 + missing 9 = 75 |

##### More data from the Inspectorate

The three year data show for the *central exam* (and not the school exam nor the joint result) for 2013-2014:

- At HAVO grade 9-11, 73% didn’t retake a class, and the graduation grade point average (GPA) on a scale of 10 was 6.3 (slightly below average). At SodK we find the final success rate: 89.6% or roughly 90% graduated in 2014.
- At VWO grade 9-12, 79% didn’t retake a class, and the graduation GPA was 6.5 (above average). At SodK we find the final success rate: 95.1% or roughly 95% graduated in 2014.

##### Is Boni cooking the books ?

Boni’s profile for VWO hardly differs from the comparison group. If someone is cooking the books then everyone is. We have no data for the final selection point at grade 11 that we discussed earlier.

Boni’s profile for HAVO is partly explained by the denominator effect. It is unclear how a correction would look like, and thus we must postpone judgement. We still would expect a graduation GPA that is higher than the national average. However, it is a bit less. Thus it is more likely that VWO students are transferred to HAVO for the mere reason that HAVO actually fits them. Perhaps these students are disappointed and not motivated to work hard for HAVO ? However, 17% of the HAVO graduates continue to VWO (see here). It is not clear to me whether these are only original HAVO students or whether there may also be former VWO students who get a second chance.

Overall, a possible explanation is that (some) elementary schools give their pupils an advice of VWO to try to get them into well respected Boni, after which the true selection happens at Boni.

Apparently, forecasting a career is difficult. Testing on mathematics skills at elementary school is not necessarily difficult but rather deliberately crooked in Holland. There are two approaches: “realistic mathematics education” (RME) and “traditional mathematics education” (TME). The tests created by CITO still allow that both methods score equally, looking only at the outcome of sums. However, only TME prepares for later algebra in highschool. Thus, CITO better develops tests that also attach value to the intermediate steps that are used to find the outcome of sums. There is also my proposal for better “neoclassical mathematics education” (NME), see here. See my letter to KNAW and CPB.

Boni seems to have a useful school model. By selecting pupils who are closely related (core and related non-core), it can concentrate on the core, while still providing well for the non-core. This model benefits from the fact that there is no education higher than VWO. Schools with a core on HAVO must provide for surrounding non-core VWO and VMBO.

A similar question arises when one can create two classes of the same denomination: mix the students, or create a faster and a slower class ? A criterion should rather be on learning styles, to allow teachers to economise their methods.

##### What is this discussion about ?

This discussion is actually rather on determination. The distinction between VWO and HAVO is close to the distinction between university and college (professional school). Some people argue that medicine at university is actually a professional education and not an education in science. This indeed also has to do with the learning styles, like Kolb’s contested theory of abstract / concrete and active / passive styles. Potentially VWO and HAVO differ in characteristics as in the diagram below, and the determination test would allocate students, depending upon school capacity or prospect for graduation. It is more likely that there are more dimensions however.

The Inspectorate of Education now uses a time horizon of three years. Within this time frame, they can already link graduates to an advice three years earlier. Thus for HAVO they can compare graduation at grade 11 to the start in grade 9.

- With one year extra, they can link graduation at VWO to grade 9, and allow for resits of HAVO. This look at HAVO and VWO would test the performance of the school itself.
- With a time horizon of seven years they can link graduates (with resits and return from HAVO to VWO) to the advice at elementary school. This comparison looks into the quality of this advice (with teachers differing from official CITO) rather than performance at Boni.

This model uses graduation as the golden standard (always on top), and uses the common nomenclature for the prospective tests (always on the side). Unfortunately, wikipedia (a portal and no source) presents this table in transposed form. Indeed, better look at wikipedia’s “worked example” that has (inconsistently) the proper orientation.

- When we spoke about the success rate above, we took VWO as the success. The success rate translates here as the positive predictive value, PPV = TP / (TP + FP).
- For determination, the discussion was too simple, since there is also the success for HAVO students, with the negative predictive value, NPV = TN / (FN + TN).
- There is a whole machinery on this kind of test analysis, and it would lead too far discussing this.

Golden standard vs test |
Graduated at VWO |
Graduated at HAVO |

VWO Advice |
True Positive (TP) | False Positive (FP) |

HAVO Advice |
False Negative (FN) | True Negative (TN) |

##### Returning to the original problem w.r.t. the success rate

Let us return to our detective job that we started out with in the earlier discussion. In this case, graduation is not the golden standard but actually only an imperfect test. We now also take account of students who graduated but shouldn’t have and only were lucky. Thus we assume that there is some golden standard that can determine whether a student is a true VWO student or not. The Inspectorate of Education should study on such a golden standard. For example, when a student is not admitted or fails at Boni but succeeds later, perhaps elsewhere, then this would be a true VWO student. This gives the table below.

The statistical success rate at SodK, say the 95.1% success of the VWO graduation at Boni in 2013-2014, only compares the two rows of students who participated, and then passed or failed. Our problem were the students who were excluded from participation who were true VWO students, while Boni erroneously thought that those were false VWO students (who would either fail deservedly or not be lucky enough to pass anyway). This information is not presented by the Inspectorate.

Golden standard vs test |
True VWO |
False VWO |

Participated and graduated |
True Positive (TP) | False Positive (FP) |

Participated and failed |
False Negative (FN-Part) | True Negative (TN-Part) |

Did not participate |
False Negative (FN-Nonpart) | True Negative (TN-Nonpart) |

Perhaps the Inspectorate should first clarify how big the problem is for the whole country before we look at schools. Obviously, a student who is not admitted to the senior (graduation) year, is ill prepared, and after a while it makes excruciating sense not to allow this student to participate in the exams. However, why was this student not admitted to the senior year in the first place ? The issue can thus be paraphrased in the familiar discussion about the risk factors for retaking a class. Still, the earlier point that the success rate better be corrected so that schools are not punished for allowing students to resit graduation, remains, and this would percolate down too to lower grades.

##### Conclusion

Boni is innocent till proven guilty. These data don’t prove that Boni doesn’t cook the books. With these data, the issue is elusive, rather more on determination than on graduation. We would need more intel based upon the individual capabilities. Potentially we need only information about students in the critical range (whose grades cause discussion in the teacher meeting), but with all the selection going on (e.g. also on A and B flows), we would include all students (also for comparison and denominator). The report by the Inspectorate is still oriented at statistics in the style of 1900 looking at the unit of the school, rather than at statistics in the style of 2000 looking at the unit of the student. It reflects the correct sentiment that schools matter, but still. Management requires measurement. When you don’t measure something, then management runs risks which otherwise might be avoided. Apparently it is not clear to the Inspectorate yet what they really want to know about students. Are you able, now, to formulate your suggestions to them ? A disclaimer is that I didn’t read up on their research agenda, but now I know better what to look for.