The DEC model looks for two things at the same time — embedding function and clustering centers. For a fixed $k$ number of clusters, DEC looks for a dimensionality-reducing embedding function $f$ that is a DNN (deep neural network), and a set of $k$ cluster centers in the lower dimensional latent space.

First for each learnable cluster center $\mu_i$ we can define the t-distributed latent cluster assignment probability (soft assignment) in a

-like fashion.where for the experiments the authors chose the degree of freedom $\alpha=1$.

The soft assignment probabilities are compared to a target distribution $p_{ij}$ using the cross entropy loss function

The target distribution is computed from $q_{ij}$ as follow

where $f_j = \sum_i q_{ij}$ are the soft cluster frequencies.

The authors want the target distribution to:

- Strengthen predictions
- Put more emphasis on data points assigned with high confidence
- Normalize the loss contribution of each centroid to prevent large clusters from distorting the hidden feature space.