Which Schools and Pupils Respond to Educational Achievement Surveys? A Focus on the English PISA Sample

Using logistic and multilevel logistic modelling we examine non-response at the school and pupil level to the important educational achievement survey Programme for International Student Assessment (PISA) for England. The analysis exploits unusually rich auxiliary information on all schools and pupils sampled for PISA whether responding or not, including data from two large-scale administrative sources on pupils' results in national public exams, which correlate highly with the PISA target variable. Results show that characteristics associated with non-response differ between the school and pupil levels. The findings have important implications for the survey design of education data.


Introduction
Results of international educational achievement surveys make it possible to compare countries' success in educational outcomes and have therefore impacted widely on education policy debate in most of the countries covered by these surveys. However, attention to the survey results has not been matched by thorough reviews of the data quality of such surveys, in particular with respect to non-response rates and non-response bias. This stands in contrast to household surveys, for which considerable research on the nature and correlates of non-response is available (for a summary see Groves (2006)).
The most prominent achievement surveys are the Programme for International Student Assessment (PISA) (Organisation for Economic Co-operation and Development (2012)), the Trends in International Maths and Science Study (TIMSS) (Olson et al. (2008)), and the Progress in Reading Literacy Study (PIRLS). The typical design involves sampling schools at a first stage and then pupils within schools at a second stage. The success of the survey depends on the level and pattern of response at both stages. Although a household survey may often have a multistage element, with individuals within each contacted household choosing whether to respond, the two stages in a survey of school children are somewhat distinct in a way that is fundamental to the survey design. Non-contact at school or pupil level is likely to be negligible for achievement surveys while factors impacting on school response may be related to the environment of the school setting as well as school commitments. Pupils' decisions to respond may depend on socio-demographic characteristics as well as the level of their schools' commitment. For these reasons, response to this type of survey deserves more attention from researchers than it has received to date.
Non-response analysis often suffers from a lack of information on the non-responding units, in particular on variables correlated with survey target variables. By contrast, our analysis exploits rich auxiliary information on all schools and pupils sampled for PISA using two external administrative datasources: the Pupil Level Annual School Census and the National Pupil Database. That is, we have additional information on schools and pupils irrespective of whether they respond to the survey. This is a rare example of linked survey and administrative data (see also for example Dearden et al (2011)). In particular, we have access to administrative registers that contain information on results in national public exams. These are strongly correlated with the main PISA target variables that measure learning achievement -a characteristic highly desirable for non-response investigations (Little and Vartivarian (2005)), but rarely found in practice (Kreuter et al (2010)). Even in the presence of linked administrative data such information is often only available for the responding cases. Moreover, in the case of the schools, we have information on the exam results of all their pupils of the target age for PISA (age 15) and not just on the sample drawn for the survey. References that analyse non-response in education data are limited but include Steinhauer (2014), Rust et al. (2013), Pike (2008), Porter and Whitcomb (2005).
This paper models non-response at both the school and pupil stages to PISA in England in the first two rounds of the survey, held in 2000 and 2003. School response among schools initially sampled averaged 66% in these two rounds and pupil response 77%, well below the targets for schools and pupils of 85% and 80% respectively set by the PISA organisers (Our figures for school and pupil response differ slightly from the official published figures for reasons described in Section 2). In fact, international reports for the 2003 round of PISA excluded the UK following concerns that the quality of data for England, where there is a separate survey, suffers from non-response bias due to the lower response rates (Organisation for Economic Co-operation and Development (2004)). PISA's main aim is to assess pupils' learning achievement. Hence, a particular focus of the paper is on the relationship of response at both the school and pupil stages of the survey with the level of pupil ability -mean pupil ability in the school in the case of school response and the individual's own ability for pupil response.
These relationships determine the extent of non-response biases in the estimates of learning achievement that the survey provides. A companion paper estimates the size of these biases using different weighting methods, a subject we do not consider here (Micklewright et al. (2012)).
In the present paper, we provide much more detailed modelling of both school and pupil nonresponse. At the pupil stage we allow for the influence of the school on pupils' response behaviour. Using cross-level interactions between pupil and school characteristics we can, for example, examine whether high performing students in a low performing school are less likely to respond than high performing students in a high performing school. Furthermore, we use random slope models to assess the school influence on the most important student response predictor: student ability.
This paper is structured as follows. The PISA survey and the linked administrative data are described in Section 2. Section 3 presents response probability models for both schools and pupils. At the school level, we estimate simple logistic regression models. To analyse pupil response we estimate multilevel models that allow investigation of the context-dependent response behaviour of pupils within schools. The results for both models are discussed in  Sturgis et al. (2006)). The PPS sampling generates a list of 'initial schools' together with two potential replacements, the schools that come immediately before and after within the stratum. If an initial school declines to participate, its first replacement is approached. If this first replacement does not respond, the second replacement is asked to participate. Schools [ Table 1 about here] Table 1 shows that response rates for initial schools are low with just 60% in 2000 and slightly better with 73% in 2003. However, the majority of replacement schools did not respond, i.e. for both years merged only 41% of first and 29% of second replacement schools responded.
These schools faced a higher survey burden. They first had less time to conduct the survey and second the PISA survey data collection was coming to closer proximity to a nationally organised test pupils sit (GCSE exams), the results of which are used for school league tables in England.
(Since 2006, the England PISA test takes place after the GCSE exams, so that the survey burden for replacement schools is considerably decreased.) Given this design Sturgis et al. (2006) questions whether replacement schools indeed improve non-response bias or if they might actually increase it. In fact, they conclude that this is an untestable question, while this paper will be able to examine this.  [ Table 2 about here]

Auxiliary information from administrative data
The main aim of PISA is to measure pupils' achievement. If non-response is associated with ability then the survey outcome will be upwards or downwards biased. It follows, that any kind of response modelling to the survey ideally needs to build on good auxiliary information on children's ability. If the auxiliary variables are highly correlated with the survey outcome variable and we do not find an association between our auxiliary variable and response then we can assume that the final sample provides a good estimate of the target variable. England is unusual in having nationally organised exams for the population of children at several ages. Once linked to the PISA survey data, this information allows respondents and non-respondents to be compared on the basis of assessments taken shortly after the survey was conducted. [ Figure 1 about here] Figure 1 shows that our auxiliary achievement measure is highly correlated with the PISA test score for reading for responding pupils in 2000 and 2003 (r=0.81). The correlation coefficient of our auxiliary variable with the PISA math test score is 0.79 (based on 2,181 responding students) and with PISA science is 0.80 (based on 2,177 students) -results not shown. This analysis therefore greatly benefits from access to highly associated and fully observed auxiliary information.
Further auxiliary information is also available from administrative records on the child's gender and whether he or she receives Free School Meals (FSM), a state benefit for low income families, and we use this information in Section 4. Information on FSM is not available at the pupil level for 2000 although we do know the percentage of individuals receiving the benefit in the child's school.
We have access to the auxiliary information just described for (almost) all 15 year olds.
While we have information on the number of pupils attending a school and whether it is private or public, we derive in addition the following school variables by averaging the PISA target population on this variable within schools: mean KS4 score, percentage of students with free school meal and percentage of male students within a school. This information is contained in the Pupil Level Annual School Census and the National Pupil Database, a combination we refer to as the 'national registers'. The linked data set of PISA survey and national register data was created by us using files supplied by the Office for National Statistics (

Methodology
A sequential modelling approach is employed. First response at the school level using a (singlelevel) logistic model is used analysing the effects of school characteristics on the school response propensities. Then for responding schools, pupil level response within schools is modelled using a multilevel logistic model taking account of both individual-and school-level effects. The 7 sequential model is a commonly used method in the non-response modelling literature, for example when modelling first contact and subsequent cooperation (Hawkes and Plewis (2006); Lepkowski and Couper (2002)) although sometimes only one of the two processes may be modelled (Pickery et al. (2001); Durrant and Steele (2009);Durrant et al. (2011)).
Let j y denote the binary response indicator for school j , The response probabilities are denoted by where j z is a vector of school-level covariates and α is a vector of coefficients. To ease interpretation, the following equation is used for judging the importance of the coefficient k a on the th k explanatory variable k z (Gelman and Hill (2006)): To investigate non-response at the pupil level a multilevel modelling approach is For the application here, the multilevel approach has a number of attractions both methodologically and substantively. First, estimates of standard errors account for the clustering of pupils within schools in PISA, thus recognising that pupils cannot be treated as independent draws from the national pupil population. Failure to account for the clustering by school leads to downward bias in standard errors, which in turn leads to overstatement of the statistical significance of effects. The problem is especially severe for coefficients of higher-level variables, school characteristics in the present case. Second, a multilevel approach allows the investigation of the influence of school-level variance on pupil response, providing insight into whether school or pupil level factors are more important in explaining pupil response. Determining the relative importance of different levels gives important insights to the level 'at which the action lies' (Brown et al. (2005)). The variance partition coefficient (VPC) indicates the proportion of variation in pupil-level response behaviour that is due to the school. Different definitions of the VPC can be used for multilevel logistic models (see Goldstein et al. (2002); Goldstein (2011), Ch.
4.9). Here, the latent variable approach, sometimes referred to as the threshold model (Snijders and Bosker (2012)), is used.
Finally, the approach allows naturally for the exploration of contextual effects that we emphasised in the Introduction to be potentially important for pupil response. For example, we can test whether pupil response behaviour depends on the achievement levels of the school as a whole with higher achieving schools impacting positively on students' response probability.
More fundamentally, we can investigate whether the impact of pupil level characteristics, such as individual ability, vary across schools (random coefficient models).
Let ij y , the dependent variable, denote the outcome for pupil i in school j coded:

Denoting the probability of response by
Pr (  1) ij ij y π = = , and taking non-response as the reference category, the two-level logistic model for student participation with a random intercept and random slope can be written as where ij x is a vector of student covariates and cross-level interactions between pupil and school level effects, 1ij x is a student level covariate with a random slope, j z is a vector of school-level characteristics (which may differ from the one specified in (1)

Modelling school response
Given that the PISA target variable is achievement, we first investigate if school response is associated with pupil ability within schools. Figure 2 shows a small reversed u-shape relationship between the schools' average mean KS4 point score and school response using non-parametric local polynomial regression (Fan 1992). The graph is limited to values of achievement that are between the 10 th and 90 th percentile of the distribution. After an initial slight decrease of the response probability for schools with on average low achieving students, it increases to its maximum around the median average KS4 point score of 36 and decreases thereafter. Schools with low achieving and schools with high achieving students have both comparable low response rates and should therefore be equally represented in the responding school sample. This could suggest that school response does not contribute to bias of the PISA outcome variable.
However, school response might well impact on the variance of the PISA achievement measure that is likely to be underestimated. The impact of non-response on the distribution of student achievement is estimated in Micklewright et al. (2012). It is however important to note, that the confidence intervals are large for all achievement values, with most of the intervals overlapping.
[ Figure 2 about here] In a second step, we model school response using logistic regression models. Table 3 shows the results for 5 nested models. Our first model is similar to the non-parametric regression with regard to variables included: we explain school response solely by including the schools' mean Key Stage 4 point score (KS4ps) and its square (divided by 100). Both coefficients are significant at the 5% level and indicate a turning point of response at the average achievement of 37.5 which fits relatively well to results displayed in Figure 2. For the 500 schools sampled for which Free School Meal (FSM) information is available, schools average pupil achievement and percentage of pupils with FSM eligibility is highly correlated (r=-0.68). The question therefore derives, how mean achievement and FSM eligibility are jointly related to schools' response probability. Model 2 in Table 3 indicates that conditional on percentage of pupils with free school meals in a school, pupils' mean achievement is not significant any more.
This also does not change once we condition on whether the school is private or public (which is not significant) and the year of data collection (Model 3). The considerable size of the coefficient for FSM stays quite similar once we condition on whether a school was main, first or second replacement (Model 4). The estimated coefficient for FSM eligibility in Model 4 (-2.6) implies that for a school with a predicted probability of response of 0.5 a 10 per cent rise in free school meal eligibility decreases the probability of response by as much as 7 per cent (-2.6*0.25*0.1).
[ Table 3 about here] This result has a clear implication for the survey design. As discussed above, on the sampling frame schools are stratified by mean achievement. This ensures that replacement schools match initial non-responding schools in their average pupil ability. However, given that free school meal eligibility seems to be more important for response than achievement, free school meal eligibility would be an important factor to stratify on. This would ensure that first and second replacement schools would match the initial non-responding schools in terms of its free-school meal eligibility and therefore create a more representative school sample.
As discussed above, replacement schools have a much lower response probability. Model 4 shows that this is true even conditional on other explanatory variables. The estimated coefficients for first (-1.0) and second replacement schools (-1.5) in Model 4 imply that in comparison to a school with a predicted probability of response of 0.5, the response probability decreases by 25 per cent for a first and 38 per cent for a second replacement school. This high non-response of replacement schools would be of great concern, if replacement schools, for example, had a higher proportion of low performing students or more students eligible for FSM, which might be less likely to respond. In that case, the use of replacement schools would not, as intended, improve the sample representativeness but actually decrease it. We find that different sets of interaction variables between replacement schools and mean achievement in schools are not significant. However, as results of model 5 indicate replacement schools with a higher proportion of children eligible for FSM are considerably less likely to respond (interaction effect significant at the 10% level). Since proportion of FSM students in the school is the most important explanatory factor of non-response (Table 3), this may question whether replacement schools help in improving the sample.
We also include region, proportion of males in school, single gender school and KS3 scores (sat at age 14) conditional on KS4 scores in our models (not shown). These variables are not found significant at the 10%. Similarly, any interaction of achievement with year cannot explain school response.
As often the case in social science applications and non-response modelling (Groves (2006)), the models explain only about 10 per cent of the variance in school-level response (pseudo R-square value) and we may therefore not be very successful in predicting non-response.
This may indicate that other non-observed variables not in the data set, like characteristics and experience of headteachers and school governments, might impact on non-response, or that non-response has random elements.

Modelling pupil response
How is student response related to student ability? Figure 3 uses a non-parametric local polynomial regression in order to examine the association of KS4 point score with response probability. Students with low KS4 point score have a considerably lower response probability than children with higher ability. The increase in response probability is large and almost linear if we focus on the achievement distribution up to the 25 th percentile, which is at the value of 28 KS4 points. The curve continues to increase at a lower rate up to the 75 th percentile of the achievement distribution (48 KS4 points) and slightly decreases after that.
[ Figure 3 about here] Results for the logistic multilevel modelling are presented in Table 4. We first explore the impact of the main student achievement variable (KS4) on pupil response. Model 2 uses a quadratic form and Model 3 the natural logarithm of the centred KS4 score. We also experimented with spline regressions. We settle with the logarithmic form since it makes results of random slopes (Models 7 and 8) easier to interpret. However, results presented in Table 4 are robust to the choice of KS4 score specification.
[ Table 4 about here] In a first step, we examine the impact of student characteristics on pupil response. In line with non-parametric results we find a considerable impact of student achievement on response.
This is conditional on the year the survey was conducted and gender; the latter coefficient indicating a significantly higher response probability for boys. Interestingly and in contrast to school response results, students' free-school meal eligibility cannot explain student response conditional on achievement (results not shown).
Across all multilevel models the random school variance is highly significant, indicating a clear school effect on pupil-level response. Using multilevel models we can measure the per cent of variation in student response due to differences between schools. Calculating the Variance Partition Coefficient (VPC) using the threshold model shows that this is a non-negligible number of 15% (Models 1 to 6). [ Figure 4 about here] Contextual effects at the school level are also investigated. Students are embedded in a school environment which might be less or more encouraging to sit the exam. We test this in Models 4 to 6 by including school-level variables but no significant effects are found. Pupils in private schools, in schools with a low proportion of FSM eligible students or in schools with high average achievement do not differ in their response behaviour to their counterparts in other schools. Also the geographic region of the school is not significant (result not shown). It is notable that our school variables are limited by providing merely information on pupil population within schools. Other school characteristics like schools' previous participation in surveys, school ethos, parents' background and attitudes and headteacher characteristics could well be associated with student non-response.
Even though school factors are on average not significant for explaining response, their association with student response might vary for different kinds of students. For example, since the PISA test took place during the preparation for GCSE exams, schools with low performing students (and hence at risk to be at a low position in the league table) might have encouraged especially higher ability students to participate in the survey while offering teaching sessions for those students refusing to take part. We therefore test for these cross-level interactions between pupil and school characteristics. Again using different specifications for achievement variables, we cannot find any significant results (Model 6). Pupil achievement impacts also similarly for schools differing by high and low social background of students (results not shown). The same is true for cross-level interaction of type of school and pupil achievement score (results not 13 shown). We therefore conclude that given the available variables neither school characteristics nor cross-level interaction of school with pupil characteristics are successful in explaining pupil non-response.
Up to now, we use random intercept models allowing the probability of response of students to depend on the school they attend. This is achieved by allowing the model intercept to vary randomly across schools. We assume however that the effect of, for example, individuals' achievement is the same across schools. In a final step, we relax this assumption and allow the random intercept and the slope on the achievement variable KS4 score to covary. Results for this random slope model are provided in Models 7 and 8.
Using the likelihood ratio test, Model 7, including a basic set of explanatory variables and the random slope, is considerably improved compared to Model 3 (p<0.005), containing the same set of basic variables but without the random slope. The effect of log achievement on the logodds of response in school j is estimated as The most interesting result is the positive intercept-slope covariance estimate of 0.11 (significant at the 5 per cent level) (correlation coefficient is 0.384). This implies that schools with above average pupil response tend to also have an above average effect of achievement on response probability. Figure 5 shows this relationship by plotting the random slope for the centred achievement variable against the random intercept for schools. In schools with low proportion of students responding, the relationship between achievement score and response in the school is low. In schools with a high proportion of responding pupils achievement impacts more on response. In other words, schools with higher student response contribute more to the bias of the data then students with lower student response, since in the former achievement has a much higher explanatory power for non-response than in the latter schools.
[ Figure 5 about here] This effect may be difficult to explain, but shows clearly that higher student participation within the schools comes at the price of worse data quality of the PISA target variable achievement. It might be that there is a 'natural' level of students' interest to participate in the survey. Some schools accept this level and do not provide further encouragement for students to take part (schools with lower student response). Other schools might try to encourage more students to sit the test thereby achieving only that better ability students can be changed in their opinion on participation.
Model 8 shows that adding school characteristics does not change findings on the random slope and again confirms the lack of school variables we can draw on in this study to explain the random school effect.

Conclusions and implications for survey practice
Over the last 15 years educational research has been enriched by many international educational achievement surveys. The results of those have widely impacted on education policy debates as well as guided educational policy formulation in countries participating in the surveys. However, there is a considerable gap in the literature examining the quality of these data, in particular with respect to non-response rates and non-response biases. The nature of response at school and student level differs from that in household surveys, so that literature on non-response for household surveys is not sufficient to close the current gap of knowledge on non-response bias in achievement surveys. This paper examines non-response patterns at school and pupil level for the most prominent achievement survey PISA in England using logistic and multilevel logistic models.
The analysis benefits from unusually rich information about non-responding schools and pupils by exploiting data from two important administrative sources including students' test results of national examinations sat at a similar time to the PISA test. These national scores are highly correlated with students' PISA scores, which is the central piece of the PISA survey.
Access to fully observed data that is highly predictive of a survey target variable is rare in the survey methods context.
Results of the school level response analysis show that first and second replacement schools have a considerably lower response probability than initially approached schools. There seems to be a trend that replacement schools are more likely to respond if pupils' mean achievement within the school is higher (only 10 per cent significance). This questions whether the inclusion of replacement schools improves the representativeness of 15 year olds in the sample in terms of their achievement.
Surprisingly, results indicate that school response does not depend so much on how pupils perform on average but rather on the socio-economic background of pupils within schools measured by free school meal eligibility. Assuming a 50% school response, every 10 per cent point increase in average free school meal eligibility within the school decreases its response probability by 7 per cent. Interestingly, school-level characteristics, such as gender composition, region, type of school (private vs public) and school size cannot explain school non-response.
The results have important implications for the sampling design of PISA. Currently initial schools are matched with those replacement schools that are in closest neighbourhood on the sampling frame regarding their composition of type of school, size, region and mean achievement. However, our results show that what really matters for school response is the free school meal eligibility of students within schools. Hence, if replacement schools are used to improve the representativeness of the school sample, it seems more important to match initial and replacement schools on the socio-economic composition of their students than on any of the factors currently used.
School response patterns differ greatly from pupil response patterns indicating that any kind of survey design needs to consider different response mechanisms at both levels for achieving the best possible representative sample. In contrast to school response, students' ability is the strongest predictor of student response while students' free school meal eligibility does not gain any importance. Results of the multilevel models show that schools matter since 15 per cent of variation in students' response are due to differences between schools. Examining contextual effects with the aim of explaining such school-level influences, we do not find that students in private schools, schools with high achievement students and schools with low free school meal eligibility show different behaviour to their peers in other schools. However, it is possible that there are other school characteristics which could explain differences between schools on pupil response like characteristics of the head of the school, school ethos, parents' attitudes, the number of requests schools previously received to participate in surveys as well as their previous response behaviour.
Given the importance of achievement for pupil response, we tested whether the association between student ability and response differed by school environment using crosslevel interactions and by schools using random slope models. While a variety of cross-level interactions proved not to be significant, we found that schools with higher pupil response tend to also have an above average effect of achievement on response probability. This clearly shows that higher student participation within the schools comes at the price of worse data quality of the PISA target variable on achievement. This means that existing practices to exclude schools with low student response from the dataset should be revisited.
Future research aiming to evaluate the quality of educational achievement surveys needs to investigate whether non-response patterns found for the UK, can be generalised over time and across countries.