Member-only story

Improved Image Encoding using transformers: Self-Distillation with No Labels (DINO) & DINOV2

Jehill Parikh
7 min readFeb 11, 2025

--

Introduction to Transformers

Transformers are a type of deep learning model introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017. They have since become the backbone of modern natural language processing (NLP) and other AI applications, including computer vision and reinforcement learning.

Key Concept: Self-Attention

Transformers rely on a mechanism called self-attention, which allows them to weigh the importance of different words in a sentence when making predictions. Unlike previous models like RNNs (Recurrent Neural Networks) that process sequences sequentially, transformers process all tokens in parallel, making them highly efficient.

Key Components of a Transformer

  1. Input Embeddings: Converts words or tokens into numerical vectors.
  2. Positional Encoding: Adds order information to the embeddings, as transformers do not inherently process sequences sequentially.
  3. Multi-Head Self-Attention: Enables the model…

--

--

Jehill Parikh
Jehill Parikh

Written by Jehill Parikh

Neuroscientist | ML Practitioner | Physicist

No responses yet