Interobserver Agreement Segmentation
Method: Four readers have designated MRI-derived breast tumors of 50 patients in an institution with a limitation framework for displaying a tumor with annotations. All annotated tumors were biopsy cancers. The similarity of the boundaries was analyzed with dice coefficients. An automatic tumor segmentation algorithm was used to segment tumors from reader notes. Segmented tumors were then compared between readers who use dice coefficients as a resemblance metric. Cases of high interobserver variability (average dice coefficient <0.8) after segmentation were analyzed by a panel of radiologists to identify the reasons for low compliance. In addition, an imaging function was extracted from each segmented tumor for a patient, which quantifies the dynamics of tumor and breast tissue improvement. Pearson`s correlation coefficients were calculated between the characteristics of each pair of readers to assess the impact of the score on functionality values. Finally, the authors quantified the extent of the variation in characteristic values caused by each reason for low agreement. VOI differed significantly between the 3 segmentation methods.
VOIs were still above 40% suvmax (Bias – 11.0 ± 11.7 and Bias – 6.4 ±, 4.4 for PET-EDGE versus 40% SUVmax and DAISNE versus 40% SUVmax) (Figures 1 and 2). The difference between the PET EDGE and DAISNE methods was less marked (Bias – 4.6 ± 9.0) (Figure 3). An example that presents the VOI with the 3 methods is illustrated in Figure 4. With Method 1, the average measurement of ADC interobserver variability and average improvement was good to excellent (Table 2). With Method 2, interobserver variability was excellent for ADC averages, AE averages, medium VE, RECIST diameter and tumor volume (Table 2). Method 3 led to average to good interobserver variability for ADC averages, AE averages, medium VE, RECIST diameter and EASL diameter (Table 2). Third, delineated contours can be made up of several clusters when fixed or adaptive methods of setting the thresholds are used. The calculation of the texture requiring a closed outline, delineated with a single cluster, a manual intervention was necessary, either to close the outline with topological operators or to select the most representative group taking into account its absorption intensity and volume. For the most heterogeneous tumours, the risk of obtaining more than one cluster was greater, resulting in a loss of this heterogeneity information by manually adapting the VOI . Gradient-based methods could therefore be preferred insofar as they lead to a single outline. Manual segmentation on CT would also eliminate cluster problems. However, it would be less reproducible between observers and much longer.
In our study, Inter-Observer reproducibility was excellent for 3 of the 5 indices selected with the 3 contouring methods and reproducibility with PET-EDGE was increasingly worse compared to 40% SUVmax and DAISNE. In our study, the inter-obedirver agreement on measurements of functional MRI parameters, it is improvement and CED, very good if semi-automatic methods were used (ICC from 0.830 to 0.910), but weaker for manual ev assessment (ICC – 0.648) and very low for manual evaluation of AE (ICC – 0.202) and CFO (ICC – 0.157). These results confirm the results of previous studies showing a higher inter-or intraobserver agreement for the use of semi-automatic volumetric measures compared to manual ROI-based measurements [24,35,36].