Incomplete neuroimaging data remains a major challenge in Alzheimer’s disease diagnosis, as many patients undergo only a subset of recommended imaging protocols. This work addresses this limitation by proposing a generative transformer-based framework designed to support multimodal analysis in the presence of missing modalities. We systematically investigate multimodal performance and fairness within a unified foundation model framework for Alzheimer’s disease classification while introducing a generative approach that combines structural MRI, DTI, and PET data and leverages ControlNet-based diffusion models to synthesize anatomically consistent surrogate modalities when data are unavailable. These synthetic images are used exclusively as a training-time augmentation strategy for incomplete-modality settings, rather than as replacements for clinical acquisitions. Vision transformers adapted via Low-Rank Adaptation are employed for efficient feature extraction, while clinical variables are integrated through a dedicated projection module. Experimental results show that a transformer-based fusion head can improve upon simple aggregation strategies in some complex multimodal settings, achieving an F1-score of 57.8% in multiclass classification when combined with generative augmentation and clinical data. However, these benefits are not uniform since strong unimodal volumetric PET baselines remain superior in the best-case binary setting, and the effect of generative augmentation is strongly configuration-dependent, with some settings benefiting while others degrading substantially under non-selective synthetic augmentation.

Foundation models meet multimodal neuroimaging: A generative transformer-based framework for Alzheimer’s disease diagnosis

Zedda, Luca
;
Loddo, Andrea;Di Ruberto, Cecilia
2026-01-01

Abstract

Incomplete neuroimaging data remains a major challenge in Alzheimer’s disease diagnosis, as many patients undergo only a subset of recommended imaging protocols. This work addresses this limitation by proposing a generative transformer-based framework designed to support multimodal analysis in the presence of missing modalities. We systematically investigate multimodal performance and fairness within a unified foundation model framework for Alzheimer’s disease classification while introducing a generative approach that combines structural MRI, DTI, and PET data and leverages ControlNet-based diffusion models to synthesize anatomically consistent surrogate modalities when data are unavailable. These synthetic images are used exclusively as a training-time augmentation strategy for incomplete-modality settings, rather than as replacements for clinical acquisitions. Vision transformers adapted via Low-Rank Adaptation are employed for efficient feature extraction, while clinical variables are integrated through a dedicated projection module. Experimental results show that a transformer-based fusion head can improve upon simple aggregation strategies in some complex multimodal settings, achieving an F1-score of 57.8% in multiclass classification when combined with generative augmentation and clinical data. However, these benefits are not uniform since strong unimodal volumetric PET baselines remain superior in the best-case binary setting, and the effect of generative augmentation is strongly configuration-dependent, with some settings benefiting while others degrading substantially under non-selective synthetic augmentation.
2026
Multimodal learning; Foundation models; Intelligent decision support; Neurodegenerative diseases; Alzheimer’s disease; Diffusion models; ADNI
File in questo prodotto:
File Dimensione Formato  
2026_Neurocomputing_Foundation models meet multimodal neuroimaging.pdf

accesso aperto

Descrizione: Articolo completo
Tipologia: versione editoriale (VoR)
Dimensione 3.71 MB
Formato Adobe PDF
3.71 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11584/482307
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact