# How do you use t-SNE?

## Step 1:

You have a collection of high-dimensional points $P_1,...,P_n$ which are fixed. You want to learn a collection of two-dimensional points $Q_1,...,Q_n$ that have similar metric and clustering behaviour as their higher dimensional counterpart.

Fixing a point $P_i$, for each of the other points $P_j$, you calculate the Euclidean distance $||P_i-P_j||_2$ which is then passed into a Gaussian distribution (unnormalized) and normalized by the sum of all such pairwise quantities (ranging over $j$):

Then we define the symmetrization, where $N$ is the ambient dimension, (where the quantity $p_{j|i}$ is to be interpreted as the probability that point $j$ would be chosen as cluster neighbour of point $i$),

## Step 2:

The parameters $\sigma_i$ need to be chosen, based on a user-specified perplexity parameter. The perplexity of a point $P_i$ is defined as the binary exponential of the Shannon entropy,

The user would choose what she wants to be $Perp(P_i)$ for each $i$. Then, a search would be performed to find the $\sigma_i$ value that gives rise to this perplexity value.

## Step 3:

For each of the $Q_i$, we calculate a similar quantity as in **Step 1, **but with a student t-distribution with 1 degree of freedom instead of Gaussian. You compute a cost function using a Kullback-Leibler divergence,

Which can then be optimized over the choice of low-dimensional point coordinates $Q_{i}$.

Strengths of using t-SNE for clustering:

- Visualization: t-SNE is particularly effective at creating a visual representation of high-dimensional data. It can provide intuitive insights about the structure of the data, including potential clusters.
- Preservation of Local Structure: t-SNE is designed to preserve the local structure of the data, making it effective at keeping similar instances close together in the reduced space.

Weaknesses of using t-SNE for clustering:

- Arbitrariness of Clusters: t-SNE does not provide explicit cluster assignments. Any clusters are visually interpreted and can be subjective or change with different runs of the algorithm, especially given t-SNE's non-deterministic nature.
- Difficulty Interpreting Distances and Densities: t-SNE is not designed to preserve distances between clusters or global structure. The distances between clusters or the relative sizes of clusters in a t-SNE plot may not hold any meaningful interpretation.
- Sensitivity to Hyperparameters: The output of t-SNE is significantly influenced by its hyperparameters, particularly the perplexity parameter. Different settings can lead to different visualizations, which can make the clustering interpretation challenging.
- Lack of Out-of-Sample Extension: t-SNE doesn't provide a straightforward way to map new, unseen data points to the reduced space, which is often needed in clustering tasks.