Prior to being applied to the new bisphenol A (BPA) re‐evaluation, the study appraisal methodology described in the 2017 BPA hazard assessment protocol, i.e. the so‐called ‘2017 methodology’, was tested on a selection of studies that had been previously appraised by EFSA in the context of its 2015 and 2016 assessments of BPA. This report describes this testing phase, its outcome and the resulting refinement of the 2017 methodology. The goals of this testing phase were to i) test the functioning of the 2017 internal validity appraisal tools for human and animal studies, and specifically (a) to verify whether the final tier of internal validity (on a three‐tier scale, with Tier 1 being the highest) automatically assigned to each study on the basis of pre‐defined criteria after answering the questions of the risk of bias tool reflected the internal validity according to expert judgement and (b) to fine‐tune and calibrate the 2017 appraisal tool on a sufficiently large study sample (development of the ‘2019 methodology’); ii) assess the comparability of the study appraisal outcome by the 2019 methodology against the ‘2015 methodology’ applied in the EFSA BPA assessments of 2015 and 2016. Concerning the first goal, the automatic allocation of epidemiological studies to an internal validity tier,based on pre‐defined criteria for combining the appraisal questions'scores, resulted in ranking them exclusively in Tier 3 (the lowest tier), in full accordance with expert judgment. For animal studies, to enable discrimination of studies into three tiers, the appraisal tool was refined; thereafter, comparability between automatic allocation‐based and expert judgement‐based scoring reached 91% (43 out of 47 appraisals). Concerning the second goal, it is acknowledged that the 2015 and 2019 methodologies present some differences with respect to the elements considered for assessing the study quality (i.e. reliability vs. internal validity). Nonetheless, the key study used to derive BPA's tolerable daily intake in the 2015 Opinion was also considered to be of high quality according to the 2019 methodology. In addition, the outcome of the appraisal of the papers by the 2019 methodology versus the 2015 methodology was overall comparable or more stringent in 92% of the cases (24 out of 26 appraisals). It follows that despite some intrinsic differences, the 2015 methodology previously used by EFSA to appraise the BPA evidence is considered sufficiently robust, even though not as structured as the 2019 methodology. Overall, the two goals of the testing phase have been achieved. The amendments of the appraisal methodology are being implemented for the full re‐evaluation of the new BPA literature and will be fully documented in the final version of the protocol annexed to the new BPA Opinion.