Performance of explainable artificial intelligence in guiding the management of patients with a pancreatic cyst
- Juan M. Lavista Ferres ,
- Felipe Oviedo ,
- Caleb Robinson ,
- Linda Chu ,
- Satomi Kawamoto ,
- Elham Afghani ,
- Jin He ,
- Alison P Klein ,
- Mike Goggins ,
- Christopher L Wolfgang ,
- Ammar A Javed ,
- Rahul Dodhia ,
- Nick Papadopolous ,
- Ken Kinzler ,
- Ralph H Hruban ,
- Bill Weeks ,
- Elliot K Fishman ,
- Anne Marie Lennon
Pancreatology |
Background/objectives
Pancreatic cyst management can be distilled into three separate pathways – discharge, monitoring or surgery– based on the risk of malignant transformation. This study compares the performance of artificial intelligence (AI) models to clinical care for this task.
Methods
Two explainable boosting machine (EBM) models were developed and evaluated using clinical features only, or clinical features and cyst fluid molecular markers (CFMM) using a publicly available dataset, consisting of 850 cases (median age 64; 65 % female) with independent training (429 cases) and holdout test cohorts (421 cases). There were 137 cysts with no malignant potential, 114 malignant cysts, and 599 IPMNs and MCNs.
Results
The EBM and EBM with CFMM models had higher accuracy for identifying patients requiring monitoring (0.88 and 0.82) and surgery (0.66 and 0.82) respectively compared with current clinical care (0.62 and 0.58). For discharge, the EBM with CFMM model had a higher accuracy (0.91) than either the EBM model (0.84) or current clinical care (0.86). In the cohort of patients who underwent surgical resection, use of the EBM-CFMM model would have decreased the number of unnecessary surgeries by 59 % (n = 92), increased correct surgeries by 7.5 % (n = 11), identified patients who require monitoring by 122 % (n = 76), and increased the number of patients correctly classified for discharge by 138 % (n = 18) compared to clinical care.
Conclusions
EBM models had greater sensitivity and specificity for identifying the correct management compared with either clinical management or previous AI models. The model predictions are demonstrated to be interpretable by clinicians.