Visualizing Emergent Concepts in Transformer Hidden States
I analyzed how DistilGPT2 represents and organizes information internally by extracting hidden state activations across transformer layers. The goal was to better understand how concepts form and evolve inside the model, since these representations are high-dimensional and not directly interpretable.
To make these representations visible, I applied PCA, t-SNE, and UMAP to project them into 3D space, allowing me to study how different concepts are structured and how interpretability changes depending on the visualization method.