PredIG: an interpretable predictor of T-cell epitope immunogenicity
Farriol-Duran R, Domínguez-Dalmases C, Cañellas-Solé A, Vazquez M, Porta-Pardo E, Guallar V.
Genome Med
Background: Cytotoxic T cells are key effectors in the immune response against pathogens and tumors. Thus, identifying those immunogenic epitopes driving T-cell activation conforms a fundamental goal for antigen-based immunotherapies. T-cell antigen discovery is challenged by immense epitope landscapes, unfeasible to screen ad hoc experimentally due to the high cost and low throughput of immunogenicity validations. Precedingly, immunoinformatic models, with orders of magnitude higher throughput such as HLA-I binding affinity tools, are used to predict the antigenic potential of T-cell epitopes. However, the resulting immunogenicity screening success rates (ISSR)-the capacity to rank truly immunogenic epitopes among top-scored candidates prioritized for experimental validation-have remained incremental and the immunological explainability underlying model predictions limited.
Results: PredIG is an interpretable predictor of T-cell epitope immunogenicity trained upon 17,448 peptide-HLA-I allele pairs (pHLAs) with reported immunogenicity in T-cell reactivity and binding assays. Upon pHLAs, PredIG integrates an in silico feature space of antigenic properties (proteasomal cleavage, TAP translocation, HLA-I binding affinity, and presentation), and physicochemical epitope descriptors, particularly focused on TCR-facing central residues. Leveraging this information, we built three antigen-specific XGBoost models to compute PredIG immunogenicity scores (PredIG-NeoAntigen, PredIG-NonCanonical, and PredIG-Pathogen). We then used Shapley Additive models (SHAP) to analyze their immunological interpretability pinpointing a balanced feature importance between antigenic and physicochemical properties. This highlighted the strong contribution of antigen processing likelihood and physicochemical characteristics, often overlooked in T-cell epitope predictions. Comparably, PredIG obtained cutting-edge ISSR performance in our pathogen and non-canonical cancer antigen held-outs versus immunogenicity, HLA-I binding, and pHLA stability predictors. In cancer neoantigens, we used PredIG to refine the success rates of HLA-I binding affinity predictions and to prioritize an additional set of immunogenic (neo)epitopes differing from top-binding candidates across the three antigen types tested.
Conclusions: Overall, we demonstrate how PredIG immunogenicity scores are instrumental to refine and expand the prioritization of actionable T-cell (neo)epitopes in infection and cancer, including non-canonical antigens not seen during training. Furthermore, PredIG displays an unprecedented immunological interpretability determining important immunogenicity drivers beyond HLA-I binding affinity. Ultimately, PredIG enables large throughput antigen discovery in open-source containerized environments ( https://github.com/BSC-CNS-EAPM/PredIG ) and facilitates accessibility via a streamlined webserver ( https://horus.bsc.es/predig ).
Abrir en Pubmed