Self-supervised Contrastive Learning for Digital Pathology

August 13, 2023
Created by
Neo Yin
In Progress
Reading Notes
My main goals for reading this paper is to understand the following things: [1] How do people implement SSL for histopathology and what kind of architecture have people tried to use for histopathology tasks, [2] any data-specific decisions in model design, [3] evaluation tasks, [4] data curation procedure, [5] evaluation metrics.
What are the main objectives of this paper?
  • Train SimCLR on a multiorgan pathology dataset without supervision
  • The paper shows that pretraining with unlabelled histopathology images can improve performance over Imagenet pretraining.
Architecture and design choices
  • SimCLR
Data curation procedure and data source
  • Having a more diverse training set and dataset-varying sampling strategy helps with the contrastive learning outcomes which benefit from having visually diverse data.
  • The majority if the WSI datasets are from TCGA and CPTAC, and a variety of public challenges datasets
  • image
Evaluation tasks
  • Settings Variation
    • Single Dataset Pretraining
    • Different Size of Pretraining Dataset
  • Task Variation
    • BACH (four classes of breast cancer classification
    • Lymph (three-class malignant lymph node cancer classification
    • BreakHisv1 (binary breast cancer classification)
    • NCT-CRC-HE100K (9 classes of colorectal cancer tissue classification)
    • Gleason2019 (five classes prostate cancer classification)
    • DigestPath2019 (WSI segmentation)
    • BreastPathQ (single regression dataset)
Evaluation metrics

For later.