Methods of Deep Clustering
Methods of Deep Clustering

Methods of Deep Clustering

May 25, 2023
Created by
Neo Yin
In Progress
Broad Research

Dimensionality Reduction


Metric Learning/Contrastive Framework


(triplet loss) for details.

FaceNet, used verbatim, may not be optimal. However the idea of class-level contrastive learning sounds like quite a promising way to promoting clustering. For problems where we expect more latent space structures from the data than the simple “in-the-same-class-or-not” clustering, this method may not suffice. It only promotes interclass distance maximization and intraclass distance minimization, but in cases where, for example, some classes are closer than others, such as cell types along the

lineages, one would FaceNet framework to neglect such structural information. My worry might be unwarranted since FaceNet is still minimizing a continuous objective and if one is to worry, instead of L2L_2 loss, a linear-Softmax loss like that used in can be a good alternative that could be easier to train. All in all, it is hard to tell, and the only way to find out is to actually implement the model.

Another issue with FaceNet is that the clustering is still label-dependent — so the latent space structure is post hoc and not clearly inherent to the input data. This might not be important for a classification problem — but is certainly important in cases when we care about interpretability (specifically in the sense of caring about scientifically meaningful representations).


Self-Supervised Learning (SSL)

Although these SSL methods do not proactively promote clustering. They allow you to leverage unlabelled data (which often exists in greater abundance compared to labelled data) to learn better and lower dimensional latent representations of the input data. Clustering methods such as

can then be performed on the latent space to see if the model is able to capture the known structures (such as what we know about the
process) of the input data.

If we see promising result from UMAP on SSL-learned features, it is in some sense significantly more valuable than those learned from supervised problems or derivatives of supervised problems (like ). The SSL-learned features are trained without labels, so the emergent structure should be indicative of an inherent structure that is independent of the classification task, rather than post hoc structure that is a shadow of the classification task.

Some canonical examples of SSL models. The implementation of

I have already done in the package I’ve been working on — dev_wsi_patch_ssl.


Clustering Loss

Similar to

, what I see from
is that it might suffer from the same kind of issue where it misses more delicate structures not captured by the simple “in-the-same-class-or-not” clustering.
DEC (Deep Embedded Clustering)
however is not post hoc as it uses only the unlabelled input data.

The intention of

is very much like
. I originally thought it was a kind of self-supervised learning plus a clustering objective but it is just the clustering objective (although it won’t hurt adding a self-supervised learning component like VAE reconstruction to the clustering loss).


Graphical Neural Networks

For introductory information about Graphical Neural Networks, see


It seems like GNN is not exactly directly relevant to the clustering task we are interested in.

It does, however, seem to be potentially useful for

type of data when automating cell differential analyses, because based on the idea of
, a GNN can be used to obtain local (based on Euclidean distance) cell contexts.