LLMs from Scratch Archives - Page 2 of 2

LLMs Scratch #004: Mixture of Experts (MoE) Models: The Architecture Powering 2025’s Best AI Systems

25.11.2025

🎯 What You’ll Learn This comprehensive guide takes you from MoE fundamentals to state-of-the-art implementations like DeepSeek V3. You’ll understand why sparse architectures outperform dense models at every compute scale, master the critical routing mechanisms that determine expert selection, and learn the training techniques that make these complex systems work. We’ll examine real benchmark results from Llama 4, Grok, and DeepSeek, explore load balancing challenges and solutions, and walk through the complete evolution of DeepSeek’s…
Read more

LLMs from Scratch #003: Modern Transformer Architectures: A Deep Dive into Design Principles and Training

23.11.2025

Modern Transformer Architectures: A Deep Dive into Design Principles and Training 🎯 What You’ll Learn In this comprehensive guide, we’ll explore the evolution of transformer architectures from the original “Attention is All You Need” paper to modern implementations. You’ll discover why today’s language models use specific design choices like RoPE position embeddings and SwiGLU activations, understand the trade-offs between serial and parallel layer arrangements, and learn how to make informed decisions about hyperparameters like head…
Read more

LLMs from Scratch #002 PyTorch Fundamentals: Building Efficient Language Models from Scratch

22.11.2025

PyTorch Fundamentals: Building Efficient Language Models from Scratch 🎯 What You’ll Learn In this comprehensive guide, we’ll explore the fundamental building blocks of PyTorch for language model development. You’ll learn how to account for memory usage across different floating-point representations, understand tensor operations and their computational costs, master efficient data movement between CPU and GPU, and develop the mindset of resource accounting that’s essential for training large-scale models. This is the practical foundation you need…
Read more

LLMs Scratch #001 Introduction to LLMs and tokenization

22.11.2025

Introduction to Large Language Models: From Scratch (Part 1) 🎯 What You’ll Learn In this comprehensive introduction to large language models, we’ll explore why efficiency at scale is just as critical as raw compute power, showing how algorithmic improvements have outpaced Moore’s Law by 44X. You’ll understand why the “bitter lesson” is misunderstood, learn the critical difference between small and large-scale phenomena, trace the fascinating evolution from Shannon’s entropy estimates through Google’s massive N-gram models…
Read more

Category: LLMs from Scratch

LLMs Scratch #004: Mixture of Experts (MoE) Models: The Architecture Powering 2025’s Best AI Systems

LLMs from Scratch #003: Modern Transformer Architectures: A Deep Dive into Design Principles and Training

LLMs from Scratch #002 PyTorch Fundamentals: Building Efficient Language Models from Scratch

LLMs Scratch #001 Introduction to LLMs and tokenization

Recent Posts

Search

Thanks for showing interest in our book!

Enter the following details to obtain the book sample

If you don't see email in your inbox, check the spam folder

We will send the code to your email

If you don't see email in your inbox, check the spam folder

Please fill in the following: