Category: LLMs from Scratch

LLM_log #010 Understanding Diffusion Models Through 1D Experiments — From DDPM to Manifold Compactness

Highlights: We implement a complete DDPM from scratch on 1D sine waves — same math as image diffusion, but every intermediate state is plottable. We track 100 parallel trajectories, measure when the model “commits” to a specific sample, then design a controlled experiment that reveals manifold compactness as the key factor determining whether diffusion succeeds or fails. So let’s begin! Tutorial Overview: Why 1D? The Dataset Forward Process Model and Training Generating from Noise What…
Read more

LLM_log #009: An Image is Worth 16×16 Words — From Transformers to Vision Transformers and SWIN

Highlights: In this post, we take a deep dive into the architecture that changed everything — the Transformer — and trace its evolution from NLP into computer vision. We start with the original encoder-decoder model, walk through self-attention and multi-head attention step by step, and then show how Vision Transformers (ViT) apply the exact same mechanism to image patches instead of words. Along the way, we answer the questions that trip everyone up: if we…
Read more

LLM_log #008: CLIP — Understanding Multimodal AI Through Step-by-Step Experiments

  Highlights: In this post, you’ll learn how CLIP connects images and text in a shared embedding space — enabling zero-shot image classification, semantic search, and visual perception scoring without any task-specific training. We start from the ground up with Vision Transformers, walk through CLIP’s contrastive learning architecture, run hands-on embedding experiments, and then push CLIP to its limits with a real-world challenge: can it tell cheap bedrooms from expensive ones using actual house sale…
Read more

LLM_log #007: From Random Text to Coherent Language – Pretraining Your First Large Language Model

Highlights: In this guide, you’ll learn how to pretrain a large language model from scratch — implementing training loops, evaluation metrics, and advanced text generation strategies. We’ll build a complete GPT-style training pipeline, watch it evolve from random gibberish to coherent text, and explore techniques like temperature scaling and top-k sampling. By the end, you’ll load professional pretrained weights into your own architecture. Source: This is part of our ongoing “Building LLMs from Scratch” series…
Read more

LLM_log #006: Implementing ChatGPT 2.0 from scratch – Rashchka

Highlights: In this post, we build a complete GPT-2 model (124 million parameters) from scratch in PyTorch. We implement every component — layer normalization, GELU activations, the feed forward network, shortcut connections — and wire them into a transformer block that we stack 12 times to create the full architecture. By the end, you will have a structurally complete GPT model that can generate text token by token. We also weave in key insights from…
Read more