Cross-domain diagnosis remains a major challenge in cervical cell pathology due to pronounced domain shifts across institutions and the subtle visual differences among disease stages, which jointly impair model generalization. To address these issues, this paper proposes a two-stage framework for cross-domain cervical cell detection. In the first stage, we propose the Spatially-Continuous Unpaired Neural Schrödinger Bridge (SC-UNSB), which constructs a synthetic intermediate domain to mitigate cross-domain distribution shifts by modeling image translation as an entropy-regularized optimal transport process. In the second stage, we propose a dual-level feature alignment strategy within a knowledge distillation framework, which progressively aligns shallow structural features and deep semantic representations to facilitate the transfer of domain-invariant knowledge from the source to the target model. Experimental results demonstrate that the proposed method effectively mitigates domain shift and category ambiguity, improving the cross-domain detection performance.
Spatially-continuous image synthesis via entropy-regularized optimal transport
Progressive feature alignment from shallow structures to deep semantics
26.9% mAP and 45.8% mAP50 on cross-domain cervical cell detection
Combining generative domain bridging with progressive feature alignment for cross-domain cervical cell detection
To mitigate cross-domain appearance discrepancies, we build upon the Unpaired Neural Schrödinger Bridge (UNSB), which formulates unpaired image translation as an entropy-regularized optimal transport problem. Our key innovation is the Dense Normalization (DN) module that ensures spatially continuous statistical fields.
Standard Instance Normalization computes statistics independently for each patch, causing boundary drift errors and tiling artifacts. SC-UNSB re-parameterizes statistical moments as continuous functions of pixel coordinates using bilinear interpolation from neighboring patches.
We propose a dual-level feature alignment strategy within a knowledge distillation framework consisting of two complementary components:
Loose Feature Alignment (LFA) operates on shallow features to preserve structural information that is less sensitive to semantic variation but vulnerable to domain shift. We transform features into the frequency domain using a multi-scale low-pass filter.
Compact Feature Alignment (CFA) aligns high-level semantic representations from the penultimate layer. A 1×1 convolution projects features into a unified embedding space, promoting transfer of class-discriminative knowledge.
Evaluated on CRIC (source) and ComparisonDetector (target) cervical cytology datasets
| Method | Ds | CycleGAN | CUT | NOT | i2i-Turbo | UNSB | SC-UNSB |
|---|---|---|---|---|---|---|---|
| Image Generation Quality | |||||||
| FID ↓ | 241.05 | 147.28 | 132.61 | 177.43 | 154.65 | 143.43 | 135.31 |
| KID×100 ↓ | 10.896 | 1.831 | 1.365 | 5.104 | 2.514 | 2.411 | 1.807 |
| NIQE ↓ | 14.72 | 13.51 | 14.60 | 12.86 | 16.39 | 13.94 | 11.38 |
| HIST ↑ | 0.384 | 0.695 | 0.722 | 0.571 | 0.473 | 0.701 | 0.754 |
| RetinaNet Detection | |||||||
| mAP ↑ | 4.6% | 10.5% | 14.0% | 9.3% | 10.3% | 17.8% | 20.2% |
| mAP50 ↑ | 9.3% | 22.4% | 29.3% | 18.0% | 26.8% | 35.4% | 41.5% |
| RetinaNet + LFA + CFA (Full Model) | |||||||
| mAP ↑ | 12.6% | 18.8% | 22.7% | 11.3% | 15.1% | 24.1% | 26.9% |
| mAP50 ↑ | 26.6% | 31.8% | 40.3% | 21.9% | 30.9% | 42.6% | 45.8% |
Table 1: Comparison across generation quality and detection performance metrics.
| Method | KD | DKD | SPD | Ours |
|---|---|---|---|---|
| mAP ↑ | 21.7% | 18.6% | 22.3% | 26.9% |
| mAP50 ↑ | 40.6% | 36.7% | 43.4% | 45.8% |
Table 2: Comparison with knowledge distillation methods.
This work presents a two-stage framework for cross-domain cervical cell detection that explicitly addresses both appearance-level domain shift and representation-level feature misalignment. By constructing a spatially coherent intermediate domain through SC-UNSB and introducing dual-level feature alignment within a distillation framework, the proposed approach enhances the transfer of domain-invariant knowledge across institutions. These results highlight the potential of combining generative domain bridging with progressive feature alignment to enable cross-domain diagnosis in cervical cytopathology.
This work was supported by the Natural Science Foundation of Jiangsu Province (BK20251838) and the Nantong Science and Technology Program Project (JC2024055).
@inproceedings{li2026twostage,
title = {Two-Stage Cross-Domain Cervical
Abnormality Screening with
Cytopathological Image Synthesis
and Knowledge Distillation},
author = {Li, Jincheng and He, Yuzhi and
Zhan, Yihui and Zhang, Xinmei and
Sun, Yifei and Liu, Zelin and
Zhang, Lichi and Shao, Minye and
Zhao, Lili},
booktitle = {International Conference on Medical
Image Computing and Computer-Assisted
Intervention},
year = {2026},
organization = {Springer}
}