Deep Learning Architectures for Thymic Tissue Segmentation and Risk Scoring from Routine CT Scans

How nnU-Net, VGG16-MLP-Mixer, and radiomics-DL fusion models segment thymic tissue from routine CT scans and predict thymoma risk across multi-center cohorts.

By Dimitris Kyriacou, PhD Molecular Biology·05/06/2026·12 min read

Advanced Experimental Methods 5 June 2026

Deep Learning Architectures for Thymic Tissue Segmentation and Risk Scoring from Routine CT Scans

The thymus is small, anatomically variable, and progressively replaced by fat with age, making it one of the most challenging mediastinal structures to segment on routine CT. Specialised deep learning frameworks, from two-stage nnU-Net pipelines to VGG16-MLP-Mixer hybrids and radiomics-DL fusion models, now automate segmentation and predict thymoma pathological risk with AUC values between 0.90 and 0.95 across multi-center validation cohorts.

TL;DR Deep learning models for thymic tissue analysis use architectures including Thy-uNET (a two-stage coarse-to-fine nnU-Net achieving Dice 0.83), VGG16-MLP-Mixer hybrids for capturing global and local dependencies, and DeepLabv3 with atrous spatial pyramid pooling for thymoma segmentation. Training strategies include ImageNet transfer learning, mediastinal cropping with slice fusion, weighted loss functions for class imbalance, and habitat imaging via clustering to quantify intratumoral heterogeneity. Scoring extends beyond volume to include CT attenuation, diameter measurements, and radiomics-DL feature fusion for WHO risk subtype prediction (AUC 0.90 to 0.95). Validation relies on geographically distinct external cohorts, reader comparison studies (where AI assistance raises junior radiologist AUC from 0.702 to 0.814), and Grad-CAM interpretability.

This methods review was generated from a verified BioSkepsis research thread View the research on BioSkepsis

What deep learning thymic segmentation and scoring achieves

Deep learning thymic segmentation and scoring automates three tasks that were previously manual, subjective, and time-consuming: delineating the thymic contour on chest CT images, extracting quantitative morphological and textural features from the segmented region, and classifying the tissue into clinically actionable risk categories. The thymus presents unique challenges for automated analysis: it is small relative to the chest volume, highly variable in shape across individuals, and progressively replaced by adipose tissue during age-related involution (PMID: 40597831). Specialised architectures like Thy-uNET address these challenges through coarse-to-fine segmentation strategies, while radiomics-DL fusion models combine learned features with handcrafted texture and shape descriptors to predict WHO pathological risk subtypes for thymic epithelial tumours (PMID: 40079653, PMID: 41204379). The clinical output is a risk score that informs whether a patient requires complete surgical resection alone (low-risk thymoma) or adjuvant chemotherapy and radiation (high-risk thymoma or carcinoma).

Why automated thymic analysis surpasses manual radiological assessment

Manual thymic assessment on routine CT is subjective, poorly reproducible, and dependent on reader experience. The thymus lacks the clear tissue boundaries of organs like the liver or kidneys; its borders blend into surrounding mediastinal fat, and its appearance changes substantially with age, body composition, and scan protocol. Junior radiologists, in particular, struggle with thymic risk classification: baseline AUC for junior readers is approximately 0.702, compared to 0.85 or higher for senior experts with decades of experience (PMID: 40823066).

Automated deep learning models eliminate this experience-dependent variability. They process the full volumetric data rather than selected slices, extract hundreds of quantitative features invisible to the human eye (texture gradients, Hounsfield unit distributions, shape irregularity metrics), and produce consistent scores regardless of reader fatigue or caseload. Multi-center validation shows that AI-assisted scoring raises junior radiologist AUC from 0.702 to 0.814, effectively closing the gap with senior experts in a single step (PMID: 40823066, PMID: 40597831).

The clinical urgency is real: thymic epithelial tumours are rare, meaning most radiologists encounter few cases per year and never develop the intuitive pattern recognition that characterises expert performance. Automated scoring provides a decision-support layer that is especially valuable in community hospitals and low-volume centres where specialist thoracic radiology expertise is unavailable.

Technical workflow: from CT acquisition to risk score

Thy-uNET: two-stage coarse-to-fine segmentation

Thy-uNET is built on the nnU-Net framework and operates in two stages. Stage one processes the full chest CT volume to localise the mediastinal region containing the thymus. Stage two receives the cropped region of interest and performs fine boundary delineation at higher resolution. This coarse-to-fine approach addresses the extreme class imbalance between thymic tissue and background thoracic anatomy. The framework automatically adapts to voxel spacing anisotropy: for routine CT scans with thick slices (spacing ratio greater than 3), the model uses pseudo-2D convolutions with out-of-plane kernel size of 1 in early layers, and applies nearest-neighbour interpolation for out-of-plane resampling to suppress artefacts. Segmentation achieves a Dice score of approximately 0.83 across independent validation cohorts (PMID: 40597831, PMID: 33288961).

VGG16-MLP-Mixer hybrid for thymic classification

This architecture combines VGG16 for hierarchical spatial feature extraction with an MLP-Mixer module that captures global and local dependencies without self-attention overhead. Preprocessing involves targeted mediastinal cropping and slice stacking, where three consecutive greyscale slices are fused into a three-channel image to capture inter-slice continuity. This design is optimised for classification tasks (normal versus pathological thymus) rather than pixel-level segmentation, operating on individual mediastinal cross-sections rather than full volumetric data (PMID: 41464191).

DeepLabv3 and habitat imaging for thymoma risk

DeepLabv3 uses atrous spatial pyramid pooling to capture multi-scale context, achieving a Dice score of 0.76 for automated thymoma segmentation. After segmentation, habitat imaging partitions the tumour region into subregions (habitats) with distinct Hounsfield unit intensity and texture patterns using clustering algorithms. These habitat features encode intratumoral heterogeneity that single-region radiomics misses. Combined with clinical features (tumour shape, density uniformity, 3D maximum diameter), habitat-informed models achieve AUC values between 0.90 and 0.95 for WHO risk subtype prediction (PMID: 40079653).

Radiomics-deep learning feature fusion

Fusion models concatenate deep learning-derived features (from architectures such as ResNet101 or DenseNet) with handcrafted radiomics features covering shape, first-order texture, and grey-level co-occurrence matrices. The combined feature vector is fed to classifiers for risk stratification. Transfer learning from ImageNet-pretrained weights is standard to mitigate data scarcity in rare thymic tumour datasets. Grad-CAM overlays provide spatial interpretability, showing clinicians which tumour regions drive the risk prediction (PMID: 41204379, PMID: 40520864, PMID: 40823066, PMID: 41210998).

Where deep learning thymic analysis delivers clinical value

Thymoma WHO risk stratification

The primary clinical application is preoperative differentiation of low-risk thymomas (WHO types A, AB, B1) from high-risk thymomas (types B2, B3) and thymic carcinomas. This distinction directly determines treatment: low-risk tumours typically require complete surgical resection alone, while high-risk tumours may necessitate adjuvant chemotherapy or radiation. The RDLCSM fusion model (radiomics, deep learning, clinical, and spatial morphology) achieves AUC values of 0.90 to 0.95 across external cohorts for this binary risk classification (PMID: 40079653, PMID: 41204379).

A 2D ResNet101 model achieved an AUC of 0.876 in external testing for risk categorisation, outperforming traditional SVM-based radiomics classifiers and standalone 3D U-Net++ models reported in literature comparisons (PMID: 40823066). Multi-dimensional fusion models integrating 2D features from the largest cross-sectional slice with volumetric information further improve accuracy without the full computational cost of 3D processing (PMID: 40079653).

Thymic health scoring and immune competence

Beyond tumour risk, thymic segmentation enables quantitative scoring of thymic health in non-oncological contexts. Automated extraction of CT attenuation, anteroposterior and transverse diameters, and lobe-specific thickness provides a morphological profile that correlates with immune output. Adults with high thymic health have approximately 50% lower risk of all-cause mortality over 12 years, and thymic health independently predicts cardiovascular mortality and immunotherapy response across multiple cancer types (PMID: 40597831).

The Thy-uNET framework maintains consistent Dice scores across both thin-section and thick-section CT subgroups despite varying levels of voxel spacing anisotropy, enabling retrospective analysis of thymic health on routine clinical scans not originally acquired for thymic assessment (PMID: 40597831).

AI-assisted radiology training and decision support

Reader comparison studies across multiple centres demonstrate that AI assistance significantly improves the diagnostic accuracy and efficiency of radiology residents and junior radiologists. In one multi-center study, AI-assisted scoring raised junior radiologist AUC from 0.702 to 0.814 for thymoma risk classification, effectively narrowing the performance gap with senior experts who achieved AUC values above 0.85 (PMID: 40823066, PMID: 40597831). This has direct implications for community hospitals and low-volume centres where specialist thoracic radiology expertise is scarce.

Validating thymic models across centres, readers, and scan protocols

Multi-center independent cohort testing

Models are tested on geographically distinct external datasets to confirm generalisability beyond the training institution. Standard practice includes internal holdout validation followed by testing on publicly available cohorts such as the NSCLC-Radiomics-Genomics dataset from The Cancer Imaging Archive. 3D models achieve AUC values of 0.763 to 0.805 across external cohorts, while 2D models achieve 0.753 to 0.777, with performance gaps narrowing in external compared to internal validation (PMID: 40597831, PMID: 41210998, PMID: 40079653).

Reader comparison studies

AI performance is benchmarked against human radiologists of varying experience levels. These studies quantify both the standalone model performance and the incremental benefit of AI assistance. The key metric is whether AI narrows the gap between junior and senior readers without degrading senior expert performance. Multi-center studies confirm that AI assistance raises junior radiologist AUC by over 0.11 points while leaving senior expert AUC largely unchanged (PMID: 40823066, PMID: 40597831).

Interpretability via Grad-CAM

Gradient-weighted Class Activation Mapping (Grad-CAM) overlays heatmaps on CT images to visualise which tumour regions and boundaries influence the model's risk score. This spatial interpretability is essential for clinical adoption: radiologists can verify that the model focuses on anatomically plausible features (tumour margins, density variations) rather than scanner artefacts or non-thymic structures. Grad-CAM outputs are reported in multiple validation studies as a standard component of model evaluation (PMID: 40823066, PMID: 41210998).

Scan protocol robustness

Thymic models must perform consistently across the wide range of acquisition parameters found in routine clinical CT: slice thickness from 1 mm to 5 mm, varying reconstruction kernels, and both contrast-enhanced and non-contrast protocols. Thy-uNET demonstrates robust performance across thin-section and thick-section subgroups by automatically adapting its convolutional configuration to voxel spacing anisotropy, using pseudo-2D convolutions and axis-specific resampling when spacing ratios exceed predefined thresholds (PMID: 40597831, PMID: 33288961).

Evidence quality and methodological limitations

The evidence base for deep learning thymic segmentation and scoring is technically mature, with multiple independent groups reporting Dice scores above 0.80 for segmentation and AUC values above 0.90 for risk stratification across external cohorts. Core findings replicate: the superiority of coarse-to-fine architectures, the benefit of radiomics-DL feature fusion over unimodal approaches, and the consistent improvement in junior radiologist performance with AI assistance. The use of publicly available external datasets (Cancer Imaging Archive) and multi-center reader studies strengthens the generalisability claims. Transfer learning from ImageNet-pretrained weights is a standard and well-validated strategy for mitigating data scarcity in rare tumour contexts.

Most models rely on retrospective data from single or limited medical centres, introducing selection bias. Studies often exclude patients with tumour invasion into the great vessels or prior chest surgeries, potentially overestimating accuracy in complex real-world cases (PMID: 40597831). Manual segmentation by radiologists remains the ground truth standard, but inter-observer variability in manual contouring introduces noise into feature extraction and model training (PMID: 41204379). 3D models are more prone to overfitting on small datasets and show greater sensitivity to variations in CT slice thickness than 2D approaches (PMID: 41210998). Prospective validation linking AI-derived risk scores to patient survival and therapeutic outcomes has not yet been reported; all existing evidence is retrospective and cross-sectional. Habitat imaging, while promising, uses different clustering algorithms across studies, limiting direct comparison of intratumoral heterogeneity metrics.

Deep learning thymic segmentation has progressed from generic U-Net adaptations to purpose-built architectures like Thy-uNET that automatically handle the voxel spacing anisotropy, extreme class imbalance, and anatomical variability intrinsic to routine chest CT. The fusion of deep learning-derived features with handcrafted radiomics and clinical morphology achieves risk stratification accuracy (AUC 0.90 to 0.95) that is clinically actionable for surgical planning. The next critical step is prospective validation: linking AI-derived thymic health scores and thymoma risk classifications to long-term patient outcomes in multi-centre trials, and establishing these models as standard decision-support tools in both high-volume academic centres and community radiology practices.

Frequently asked questions

What is Thy-uNET and how does it segment thymic tissue?

Thy-uNET is a two-stage coarse-to-fine segmentation framework built on the nnU-Net architecture. The first stage performs localisation on full CT images to identify the mediastinal region. The second stage performs fine boundary delineation within the cropped region of interest, achieving a Dice score of approximately 0.83 in independent cohorts.

How does voxel spacing anisotropy affect 2D versus 3D model selection?

Routine CT scans often have high in-plane resolution but thick slices (high anisotropy). When the spacing ratio exceeds 3, 2D configurations are favoured because they process full-resolution in-plane slices without interpolation artefacts. For 3D models on anisotropic data, convolutional kernels for the out-of-plane axis are initially set to 1 (pseudo-2D convolutions), and resampling uses nearest-neighbour interpolation for the low-resolution axis.

What is habitat imaging in thymoma analysis?

Habitat imaging partitions a segmented thymic tumour into subregions (habitats) with distinct Hounsfield unit intensity and texture patterns using clustering algorithms. This captures intratumoral heterogeneity that single-region analysis misses, improving risk stratification accuracy by encoding spatial variations in tumour composition.

What performance do radiomics-deep learning fusion models achieve?

Fusion models that combine deep learning-derived features with handcrafted radiomics features (shape, texture, intensity) achieve AUC values between 0.90 and 0.95 for thymoma risk categorisation across external multi-center cohorts. They significantly outperform unimodal radiomics or standalone deep learning models used in isolation.

How does AI assistance improve radiologist diagnostic accuracy for thymic tumours?

Multi-center reader studies show that AI-assisted scoring significantly improves the diagnostic accuracy of junior radiologists and radiology residents, narrowing the performance gap with senior experts. In one study, AI assistance increased the AUC for junior radiologists from 0.702 to 0.814 for thymoma risk classification.

What is Grad-CAM and why is it used in thymic imaging models?

Gradient-weighted Class Activation Mapping (Grad-CAM) is an interpretability tool that visualises which specific tumour regions and boundaries influence the model's risk score. It overlays heatmaps on CT images to show where the network focuses attention, providing clinicians with spatial explanations for AI predictions.

How does BioSkepsis generate these methods reviews?

BioSkepsis synthesises PubMed-indexed literature into structured reviews with citation-level verification. Every citation undergoes three independent checks to confirm it directly supports the associated claim. Unverified citations are flagged and excluded from the main synthesis.

Explore Experimental Methods with Citation-Verified PubMed Synthesis

BioSkepsis builds structured literature reviews with three-stage citation verification, turning PubMed evidence into actionable experimental protocols and methods summaries.

Start free

Sources and further reading

Thy-uNET two-stage nnU-Net for thymic segmentation and health scoring (PMID: 40597831)
VGG16-MLP-Mixer hybrid for thymic tissue classification (PMID: 41464191)
DeepLabv3 segmentation and habitat imaging for thymoma risk stratification (PMID: 40079653)
ResNet101 risk classification and AI-assisted reader studies (PMID: 40823066)
Radiomics-deep learning fusion for WHO thymoma subtype prediction (PMID: 41204379)
Radiomics and DL feature combination for thymic tumour scoring (PMID: 40520864)
Multi-center validation and Grad-CAM interpretability in thymic models (PMID: 41210998)
nnU-Net self-configuring framework and anisotropy handling (PMID: 33288961)
SegNet architecture for biomedical image segmentation (PMID: 28060704)
3D model overfitting and data volume requirements (PMID: 41210998)