Category: GEN AI

LLM_log #010 Understanding Diffusion Models Through 1D Experiments — From DDPM to Manifold Compactness

Highlights: We implement a complete DDPM from scratch on 1D sine waves — same math as image diffusion, but every intermediate state is plottable. We track 100 parallel trajectories, measure when the model “commits” to a specific sample, then design a controlled experiment that reveals manifold compactness as the key factor determining whether diffusion succeeds or fails. So let’s begin! Tutorial Overview: Why 1D? The Dataset Forward Process Model and Training Generating from Noise What…
Read more

LLM_log #009: An Image is Worth 16×16 Words — From Transformers to Vision Transformers and SWIN

Highlights: In this post, we take a deep dive into the architecture that changed everything — the Transformer — and trace its evolution from NLP into computer vision. We start with the original encoder-decoder model, walk through self-attention and multi-head attention step by step, and then show how Vision Transformers (ViT) apply the exact same mechanism to image patches instead of words. Along the way, we answer the questions that trip everyone up: if we…
Read more

LLM_log #008: CLIP — Understanding Multimodal AI Through Step-by-Step Experiments

  Highlights: In this post, you’ll learn how CLIP connects images and text in a shared embedding space — enabling zero-shot image classification, semantic search, and visual perception scoring without any task-specific training. We start from the ground up with Vision Transformers, walk through CLIP’s contrastive learning architecture, run hands-on embedding experiments, and then push CLIP to its limits with a real-world challenge: can it tell cheap bedrooms from expensive ones using actual house sale…
Read more

dH #027: A Unified Framework for Deep Learning Architectures: From Sequences to Graphs

🎯 What You’ll Learn In this comprehensive guide, we’ll explore a unified framework for understanding deep learning architectures across different data types. You’ll learn how to design models based on fundamental principles of invariance and equivariance, understand the spectrum from domain-specific to general-purpose approaches, master the building blocks of temporal sequence models including RNNs and Transformers, and discover how spatial convolution models and graph neural networks all fit into one coherent paradigm. By the end,…
Read more

Enhancing LLMs using Retrieval-Augmented Generation (RAG) models

Highlights: Retrieval-Augmented Generation (RAG) models are transforming AI by combining Large Language Models (LLMs) with external memory to improve accuracy, reduce hallucinations, and provide context-aware outputs. In today’s post, we’ll explore RAG’s key concepts, challenges, and cutting-edge advancements and learn how RAG models enhance the reliability of LLM models and increase their adaptability across diverse applications. So let’s begin! Tutorial Overview: 1. LLMs today & their Many Challenges Output Accuracy While language models are impressive,…
Read more