The kappa values reflecting the degree of agreement on this issue (question 2) are listed in Table IV. The kappa score for all embryologists (0.676, 95% CI: 0.617-0.724) matches well. The level of experience as an embryologist, the level of research experience and the number of days per week in embryo classification also did not have a significant influence on the kappa matching coefficient, although a small but constant decrease from Question 1 to Question 2 was observed (Supplementary Table SI). The average frequency with which embryologists changed their response from Question 1 to Question 2 (based on additional images from Day 3) was 8.4 (SD 4.2) and ranged from 3 to 15. When embryologists changed their initial decision, in 20 cases (for all embryologists, i.e. two decisions per embryologist), the embryo they selected was the same as that of the embryologist actually transferred. In 53 decisions (average: 5.3 decisions per embryologist), embryologists changed their initial selection from one embryo that corresponded to the embryo actually selected for laboratory transfer to another. The evaluation of the inter-observer agreement with regard to the classification of MCI, trophectoderm, embryo quality and stage of development showed that the agreement between the participating embryologists on the different aspects of embryo classification was not good. In addition, a higher degree of agreement was also observed between embryologists when asked to select an embryo for transfer rather than indicate a degree of morphology.
This result was also observed for the correspondence between the day 5 embryo selected by each embryologist in the study for the transfer and the embryo selected by the embryologist on the actual day of the transfer, which was higher than the same comparison for the morphological classification of embryos. One possible explanation could be that it is easier to select the best embryo from a cohort of embryos, while providing the same level of morphology for the individual characteristics of an embryo (MCI, trophectoderm or stage of development) might be more difficult, resulting in a lower match. Something that must also be taken into account in the corresponding values of the kappa coefficient is that the kappa coefficient is a randomly corrected index. Therefore, for the same percentage match, the kappa coefficient is lower for a question where the number of potential responses is lower than for a question where the number of potential responses is higher (Bakeman et al., 1997). The inter-observer agreement for the morphological classification of the embryo on day 5 was also analysed to determine which aspect of the embryo assessment was the most difficult to reach an agreement. The inter-observer agreement between the individual embryologists and the embryologist who selected the embryo on the day of the laboratory transfer was also examined. In addition, the inter-observer agreement for the morphological evaluation of embryos by individual embryologists and the embryologist on the day of transfer was also established. In addition to factors such as observer, timing and scanner, FDG-PET quantification itself is characterized by (e.B. relative calibration between PET scanner and dose calibrator, paravein administration of FDG-PET), biological (e.B. blood glucose level; Patient movement or breathing) and physical factors (p.B analysis recording parameters, return on investment, blood glucose correction) [41]. In our studies, intra- and inter-observer agreement was evaluated in relation to the post-imaging process; Therefore, technical, biological and most physical factors did not come into play, while the size and type of return on investment used are specific to the observer and therefore cannot be modeled separately from the “observer” factor. When considering the daily variation of scans and multicenter tests, the PET procedure guideline [5] should be followed in order to maintain the accuracy and precision of quantitative measurements in the best possible way.
The technical, biological and physical factors discussed by Boellard [41] can in principle be partially included in a statistical model as explanatory variables; however, only those that justify a corresponding increase in sample size should be considered (see discussion of appropriate sample sizes above). Repeated measurement point cloud of SUVmax for 30 patients (study 1). (EPS 24 KB) The data from Study 1 were represented graphically by Bland-Altman graphs with the corresponding compliance limits, defined by the estimated mean difference between readings +/- 1.96 times the standard deviation of differences between measured values. These graphs were supplemented by lines derived from linear regressions of differences on means, also known as Bradley-Blackwood methods [18], to support the visual assessment of trends on the measurement scale. The data in Study 2 were represented by line graphs over time by the observer. Bland JM, Altman DG. Statistical methods for assessing the correspondence between two clinical measurement methods. Lancet. 1986;1(8476):307–10.
For this reason, the objective of the present study was to evaluate the inter-observer and intra-observer agreement of 10 embryologists from different clinics when selecting the best day-5 embryo (the one with probably the highest implantation potential) for the transfer, as well as the inter-observer agreement between these embryologists in the morphological classification of Day-5 embryos. Previous research on morphological evaluation of embryos at an early stage (two pronuclei (2PN) on day 3) has shown varying degrees of agreement between observers and intra-observers (Arce et al., 2006; Baxter Bendus et al., 2006; Paternot et al., 2009; Paternot et al., 2011). The evaluation of day 5 embryos differs significantly from that of early-stage embryos because a blastocyst structure is more complex and introduces more variables to consider when deciding which embryo is appropriate for transfer (Gardner et al., 2000). It is currently unclear whether this affects embryologists` ability to identify the same embryo as the most suitable for transfer. If inter-observer and intra-observer agreement for embryologists is bad in choosing the best embryo for transfer, this can lead to inconsistent and non-optimal pregnancy rates. Langenbucher, J., Labouvie, E., & Morgenstern, J. (1996). Methodological developments: Measurement of diagnostic agreement.
Journal of Counseling and Clinical Psychology, 64, 1285-1289. In Study 1, we found an OR of 2.46, half the width of the Bland-Altman compliance limits. In Study 2, the CR for identical conditions (same CT scan, same patient, same time and same observer) was 2392; Due to the possibility of different scanners, the RC was increased to 2543. Differences between observers were negligible compared to differences due to other factors; between observers 1 and 2: −10 (95% CI: -352 to 332) and between observers 1 vs 3: 28 (95% CI: -313 to 370). The classification agreement between embryologists based on MHI, trophectoderm, quality and stage of development was also assessed based on subgroup analyses. The level of experience as an embryologist, the level of research experience and the number of days per week in embryo classification did not significantly change the Kappa matching coefficient (SII supplementary table). We understand repeatability as an agreement and not as an assessment of reliability (see Appendix), while the ICC is sometimes used as an assessment of repeatability [38, 39]. Since CCI is highly dependent on variation between subjects and can produce high values for heterogeneous patient groups [30, 31], it should only be used to assess reliability. .