May 21, 2023
Created by
Neo Yin
Done ✨
Reading Notes

What does DeepHeme do?

DeepHeme is a ResNeXt-50 architecture.

Input is a local magnified image of a cell in the centre.

Output is a classification of the cell into the 23 classes — each being a type of cell. I’ll think about later what do these classes mean.

It falls under automated


What problem does DeepHeme solve and why is it important?

is important because, in the case of
it guides diagnostic and treatment decisions for
Hematologic Disorders

Practical applications of

involved cell counting and classification. This is time consuming done by pathologists under a microscope, also prone to human error.

The AI can be much faster and more accuracy.

What is the specific deep learning model used? Can I describe it layer by layer? What are the training specifications used in this paper and why are these specifications chosen?

What kind of data is used here? Where is the data from?

50 slides of BMAs from UCSF Parnassus (adult) and UCSF Benioff (children) from 2017 to 2020.

Slides show “uninvolved marrow and normal


Each slide contains around 10,000 cell images that are patched out, and labelled by the “consensus decision of an expert panel of three hematopathologists”.

Some Questions:
How are the images patched out? By hand or by segmentation software? If the former, by whom? If the latter, by what model? How does the quality of the segmentation affect the quality of the classification? Shouldn't the segmentation software be considered to be an important part of the system? How can we use image augmentation to improve the robustness of the classifier to potentially poor segmentation?

The object detection model

is employed for the detection of cells.

How is the consensus labelled achieved? By double reading and majority rule? By sitting together and discussing? What are the potential implications of different consensus labelling methods on the potential correctness, generalizability, future update-ability, usability, etc. of the model?
What is the intended user interface of the model, and how does it fit seamlessly into perhaps an existing morphological examination workflow by human pathologists?

DeepHeme is then trained via supervised learning.

What if we didn’t use supervised learning?

e.g. self-supervised representation learning, latent-space clustering, GANs, …

What are the cell types included in the classification task and what do they mean?

The 23 cell types include:

Some Questions
Dumb question. If any of these cell types can go through a mitotic state, does that mean we have another axis of binary classification? What about artifact, is it treated like a separate cell type?

Describe the mathematics behind the performance metrics used in this paper to evaluate DeepHeme

Here are the performance metrics used in this paper:

What are some of the evidence that the model is generalizable?

The model is being tested on a held-out validation set, as well as datasets from different hospitals such as MSKCC, UCSF and Brigham and Women’s Hospital), and competed against human experts to achieve faster, more accurate and precise results.

Since the human experts would be using the AI labeller as an adjunct tool, would it be possible to conceive of a continued cross-validation procedure across hospitals to regularly monitor and refine the model? Could conformal prediction help?

What is the UMAP employed in the paper? Can I describe the mathematical underpinning of this technique? What does the UMAP result for DeepHeme tell us?

For the mathematics, I will need to know more about

To understand the implications of the UMAP results for blood cell morphology, I’m going to need to read

to understand more about the biology behind it all.

What is the role of the saliency map? Describe the mathematical underpinning of the saliency map used in this paper. In this paper what does it tell us?

techniques are employed for saliency mapping.

What are some limitations of the current work? How does that inform future research direction?
  • The BMA cell images used in this paper come from BMAs that exhibit quite normal
    . The model is expected to perform poorly for cells with more abnormal and deviant morphology. We need to use a much more diverse dataset that includes different kinds of pathological and abnormal BMA.
  • Though labelled cell images are hard to come by, we might have a lot more unlabelled BMAs than labelled ones. We could potentially use techniques beyond supervised learning to improve the downstream supervised learning results.
  • As a thought toward practical model generalizability, since the current DeepHeme model is so high-performing, have we considered the potential incorporation of conformal prediction, as a method of model performance checking, as well as a kind of model adaptation with theoretical performance guarantee?