  # tSNE

Created
May 25, 2023
Created by Neo Yin
Status
Done ✨
Tags  🎯
My goal for this reading is only to have a high-level step-by-step understanding of the algorithm.

# How do you use t-SNE?

## Step 1:

You have a collection of high-dimensional points $P_1,...,P_n$ which are fixed. You want to learn a collection of two-dimensional points $Q_1,...,Q_n$ that have similar metric and clustering behaviour as their higher dimensional counterpart.

Fixing a point $P_i$, for each of the other points $P_j$, you calculate the Euclidean distance $||P_i-P_j||_2$ which is then passed into a Gaussian distribution (unnormalized) and normalized by the sum of all such pairwise quantities (ranging over $j$):

$p_{j|i} = \frac{\exp (-||P_i - P_j||_2^2 / 2\sigma_i^2)}{\sum_{k\neq i} \exp (-||P_i-P_k||_2^2 / 2\sigma_i^2)} .$
Symmetrization:

Then we define the symmetrization, where $N$ is the ambient dimension, (where the quantity $p_{j|i}$ is to be interpreted as the probability that point $j$ would be chosen as cluster neighbour of point $i$),

$p_{ij} = \frac{p_{i|j} + p_{j|i}}{2N}.$

## Step 2:

The parameters $\sigma_i$ need to be chosen, based on a user-specified perplexity parameter. The perplexity of a point $P_i$ is defined as the binary exponential of the Shannon entropy,

$Perp(P_i) = 2^{-\sum p_{j|i} \log_2 p_{j|i}}.$

The user would choose what she wants to be $Perp(P_i)$ for each $i$. Then, a search would be performed to find the $\sigma_i$ value that gives rise to this perplexity value.

## Step 3:

For each of the $Q_i$, we calculate a similar quantity as in Step 1, but with a student t-distribution with 1 degree of freedom instead of Gaussian. You compute a cost function using a Kullback-Leibler divergence,

$C = \sum_i \sum_j p_{j|i} \log \frac{p_{j|i}}{q_{j|i}}.$

Which can then be optimized over the choice of low-dimensional point coordinates $Q_{i}$.

What are some strengths and weaknesses of using t-SNE (for clustering applications)?

Strengths of using t-SNE for clustering:

1. Visualization: t-SNE is particularly effective at creating a visual representation of high-dimensional data. It can provide intuitive insights about the structure of the data, including potential clusters.
2. Preservation of Local Structure: t-SNE is designed to preserve the local structure of the data, making it effective at keeping similar instances close together in the reduced space.

Weaknesses of using t-SNE for clustering:

1. Arbitrariness of Clusters: t-SNE does not provide explicit cluster assignments. Any clusters are visually interpreted and can be subjective or change with different runs of the algorithm, especially given t-SNE's non-deterministic nature.
2. Difficulty Interpreting Distances and Densities: t-SNE is not designed to preserve distances between clusters or global structure. The distances between clusters or the relative sizes of clusters in a t-SNE plot may not hold any meaningful interpretation.
3. Sensitivity to Hyperparameters: The output of t-SNE is significantly influenced by its hyperparameters, particularly the perplexity parameter. Different settings can lead to different visualizations, which can make the clustering interpretation challenging.
4. Lack of Out-of-Sample Extension: t-SNE doesn't provide a straightforward way to map new, unseen data points to the reduced space, which is often needed in clustering tasks.