Evaluating optical coherence tomography (OCT) biomarkers may result in disease-dependent variability — even between certified graders in an optimized clinical setting, according to a study published in Eye. These disagreements highlight the need for artificial intelligence in these feature analyses, the report suggests.
“Even among retina specialists, there is significant disagreement in identifying patients with retinal fluid or referable retinal disease, when assessed by OCT,” according to the study authors. The team acknowledges “considerable variability in OCT image assessment, with potentially sight-threatening implications for the patients.”
Researchers included 356 eyes of 207 patients with neovascular age-related macular degeneration (nAMD; 27.25% of eyes), diabetic macular edema (DME; 38.76% of eyes), and retinal vein occlusion (RVO; 33.99% of eyes) in a post-hoc analysis of 5 randomized clinical trials. Random OCT scans were selected and graded by 7 OCT-certified graders who determined the presence and location of intraretinal fluid, subretinal fluid , pigment epithelial detachment, vitreomacular interface, epiretinal membrane, macular hole, and macular atrophy).
A total of 3 different OCT devices captured the randomly selected images.
Intergrader agreement for the presence of intra and subretinal fluid, and pigment epithelial detachment was almost perfect (κ range, 0.81-0.85), substantial for vitreomacular interface (κ, 0.77), and fair for epiretinal membrane (κ, 0.37). Stratified by OCT devices, different devices led to different levels of agreement, the report shows. Stratified by disease, agreement was often similar. Agreement on IRF presence, however, was higher for RVO (κ, 0.86) compared with nAMD (κ, 0.69) or DME (κ, 0.64).
The retinal thickness at the center point and central millimeter and intraretinal fluid and pigment epithelial defect heights had excellent agreement between graders (intraclass correlation coefficient [ICC] range, 0.92-1.00), whereas the height of the subretinal fluid had good agreement (ICC, 0.85). Stratified by disease, agreement tended to be similar. However, intraretinal fluid height had poorer agreement in DME (ICC, 0.87) compared with nAMD (ICC, 0.98) or RVO (ICC, 0.96).
Localizing intraretinal fluid demonstrated the best agreement in the central millimeter (κ, 0.83), followed by outside the central millimeter (κ, 0.81) and center point (κ, 0.78). Subretinal fluid had the best agreement in the center point (κ, 0.86), followed by the central millimeter (κ, 0.85) and outside the central millimeter (κ, 0.84). Pigment epithelial detachment showed the highest agreement outside the central millimeter (κ, 0.80), followed by the central millimeter (κ, 0.77), and center point (κ, 0.67).
“There was a substantial grading disagreement concerning [intraretinal fluid] in nAMD and DME,” according to the researchers. “Importantly, any image assessment by a human, even in the highly standardized setting of a reading center, remains laborious and to a certain degree subjective. Our goal should therefore focus on the adoption of automated imaging analysis tools for a more precise, efficient, and objective image assessment.”
Study limitations include a limited number of scans with epiretinal membranes, macular holes, and macular atrophy.
Disclosure: Multiple study authors declared affiliations with biotech, pharmaceutical, and/or clinical research organizations. Please see the original reference for a full list of authors’ disclosures.
Michl M, Neschi M, Kaider A, et al. A systematic evaluation of human expert agreement on optical coherence tomography biomarkers using multiple devices. Eye (Lond). Published online December 28, 2022. doi:10.1038/s41433-022-02376-w