August 13, 2023
Hypothesis Annotations: https://hyp.is/go?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2011.13971.pdf&group=o94Eonpx
My main goals for reading this paper is to understand the following things:  How do people implement SSL for histopathology and what kind of architecture have people tried to use for histopathology tasks,  any data-specific decisions in model design,  evaluation tasks,  data curation procedure,  evaluation metrics.
What are the main objectives of this paper?
- Train SimCLR on a multiorgan pathology dataset without supervision
- The paper shows that pretraining with unlabelled histopathology images can improve performance over Imagenet pretraining.
Architecture and design choices
Data curation procedure and data source
- Having a more diverse training set and dataset-varying sampling strategy helps with the contrastive learning outcomes which benefit from having visually diverse data.
- The majority if the WSI datasets are from TCGA and CPTAC, and a variety of public challenges datasets
- Settings Variation
- Single Dataset Pretraining
- Different Size of Pretraining Dataset
- Task Variation
- BACH (four classes of breast cancer classification
- Lymph (three-class malignant lymph node cancer classification
- BreakHisv1 (binary breast cancer classification)
- NCT-CRC-HE100K (9 classes of colorectal cancer tissue classification)
- Gleason2019 (five classes prostate cancer classification)
- DigestPath2019 (WSI segmentation)
- BreastPathQ (single regression dataset)