PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation
| dc.contributor.author | Schmidt, Christian | |
| dc.contributor.author | Overhoff, Heinrich Martin | |
| dc.contributor.editor | Skala, Václav | |
| dc.date.accessioned | 2025-07-30T09:04:04Z | |
| dc.date.available | 2025-07-30T09:04:04Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract-translated | In medical image segmentation, limited external validity remains a critical obstacle when models are deployed across unseen datasets, an issue particularly pronounced in the ultrasound image domain. Existing solutions-such as domain adaptation and GAN-based style transfer-while promising, often fall short in the medical domain where datasets are typically small and diverse. This paper presents a novel application of principal component analysis (PCA) to address this limitation. PCA preprocessing reduces noise and emphasizes essential features by retaining approximately 90% of the dataset variance. We evaluate our approach across six diverse breast tumor ultrasound datasets comprising 3,983 B-mode images and corresponding expert tumor segmentation masks. For each dataset, a corresponding dimensionality reduced PCA-dataset is created and U-Net-based segmentation models are trained on each of the twelve datasets. Each model trained on an original dataset was inferenced on the remaining five out-of-domain original datasets (baseline results), while each model trained on a PCA dataset was inferenced on five out-of-domain PCA datasets. Our experimental results indicate that using PCA reconstructed datasets, instead of original images, improves the model’s recall and Dice scores, particularly for model-dataset pairs where baseline performance was lowest, achieving statistically significant gains in recall (0.57 ± 0.07 vs. 0.70 ± 0.05, p = 0.0004) and Dice scores (0.50 ± 0.06 vs. 0.58 ± 0.06, p = 0.03). Our method reduced the decline in recall values due to external validation by 33%. These findings underscore the potential of PCA reconstruction as a safeguard to mitigate declines in segmentation performance, especially in challenging cases, with implications for enhancing external validity in real-world medical applications. Future studies are proposed to optimize PCA configurations for diverse imaging datasets and exploring integration with existing external validation methods. | en |
| dc.description.sponsorship | This work was funded by the German Federal Ministry of Education and Research (BMBF) under the program KMU-innovativ: Medizintechnik (project name: MammaSound, grant number 13GW0703B). | |
| dc.format | 8 s. | cs |
| dc.format.mimetype | application/pdf | |
| dc.identifier.doi | http://www.doi.org/10.24132/CSRN.2025-5 | |
| dc.identifier.issn | 2464-4617 (Print) | |
| dc.identifier.issn | 2464-4625 (online) | |
| dc.identifier.uri | http://hdl.handle.net/11025/62211 | |
| dc.language.iso | en | en |
| dc.publisher | Vaclav Skala - UNION Agency | en |
| dc.rights | © Vaclav Skala - UNION Agency | en |
| dc.rights.access | openAccess | en |
| dc.subject | segmentace nádorů prsu | cs |
| dc.subject | generalizace domén | cs |
| dc.subject | ultrazvukové zobrazování | cs |
| dc.subject | neuronové sítě | cs |
| dc.subject | analýza hlavních komponent | cs |
| dc.subject.translated | breast tumor segmentation | en |
| dc.subject.translated | domain generalization | en |
| dc.subject.translated | ultrasound imaging | en |
| dc.subject.translated | neural networks | en |
| dc.subject.translated | principal component analysis | en |
| dc.title | PCA for Enhanced Cross-Dataset Generalizability in Breast Ultrasound Tumor Segmentation | en |
| dc.type | konferenční příspěvek | cs |
| dc.type | conferenceObject | en |
| dc.type.status | Peer reviewed | en |
| dc.type.version | publishedVersion | en |
| local.files.count | 1 | * |
| local.files.size | 1490007 | * |
| local.has.files | yes | * |