Data Representativeness: Issues and Solutions
The present document has been produced and adopted by the bodies identified above as author(s). This task has been carried out exclusively by the author(s) in the context of a contract between the European Food Safety Authority and the author(s), awarded following a tender procedure. The present document is published complying with the transparency principle to which the Authority is subject. It may not be considered as an output adopted by the Authority. The European Food Safety Authority reserves its rights, view and position as regards the issues addressed and the conclusions reached in the present document, without prejudice to the rights of the authors.
In its control programmes on maximum residue level compliance and exposure assessments, EFSA requires the participating countries to submit results, from specific numbers of food item samples, analyzed in the countries. These data are used to obtain estimates such as the proportion of samples exceeding the maximum residue limits, and the mean and maximum residue concentration per food item to assess exposure. An important consideration is the design and analysis of the programmes. In this report, we combine elements of survey sampling methodology, and statistical modeling, as a benchmark framework for the programmes, starting from the translation of research questions into statistical problems, to the statistical analysis and interpretation. Particular focus is placed on the issues that could affect the representativeness of the data, and remedial procedures are proposed. For example, in the absence of information on the sampling design, a sensitivity analysis, across a range of designs, is proposed. On the other hand, weighted generalized linear mixed models, and generalized linear mixed models combining both conjugate and normal random effects, are proposed, to address selection bias. Likelihood-based analysis methods are also proposed to address missing and censored data problems. Suggestions for improvements in the design and analysis of the programmes are also identified and discussed. For instance, incorporation of stratified sampling methodology, in determining both the total number, and the allocation of samples to the participating countries, is proposed. All through the report, statistical analysis models which properly take into account the hierarchical (and thus correlated) structure in which the data are collected are proposed.