HLA-genotype-based Predictive Diagnosis of T-cell Responses to SARS-CoV-2 Infection Powered by Machine Learning

European Society of Medicine |

PDF | DOI

Background: The COVID-19 pandemic has necessitated the development of efficient diagnostic tools to predict T-cell responses, which are crucial for viral clearance and protection against reinfection. Current diagnostic tests lack the ability to predict the epitope repertoire of an individual that induces T-cell responses.

Methods: We developed VERDI, a new machine learning-based diagnostic tool that leverages the sequence data of all the six HLA class I alleles of an individual to rank all putative epitopes based on their potential to induce T-cell responses. VERDI was trained on a comprehensive clinical dataset of 920 SARS-CoV-2 epitopes and validated using an independent dataset collected for the FDA-approved T-detect COVID test. We compared VERDI’s performance with existing HLA-allele-based models through statistical analyses.

Results: Our findings reveal that VERDI’s top-ranked epitopes accurately represent the individual’s epitope repertoire that participates in T-cell responses. VERDI outperformed current models, improving T-cell response prediction recall by threefold and precision by eightfold. It exhibited exceptional diagnostic accuracy, precision, and recall in predicting the potency of the top 20 epitopes. Despite experimental limitations that allow testing of only 1% of putative epitopes, VERDI accurately predicted 30% of these, implying a potentially higher accuracy if broader testing were feasible. Notably, the mean potency of the top-ranked epitopes predicted by VERDI, which reflects the strength of an individual’s SARS-CoV-2-specific T-cell responses, exhibited a Gaussian distribution.

Conclusions: VERDI is the first diagnostic tool that uses the complete HLA genotype data to predict the breadth and strength of an individual’s T-cell responses to SARS-CoV-2 infection. Its ability to accurately identify the potency of epitopes involved in individual T-cell responses and its superior performance compared to the state-of-the-art make it a new resource for personalized vaccine design and disease management.