Raschka's LLM architecture gallery
Sebastian Raschka released an LLM Architecture Gallery that collects config files, tech reports, and explainers for models like Llama 3, Qwen3 and Grok 2.5 — a compact way to study real model setups and training tricks. (x.com)
Sebastian Raschka announced the new LLM Architecture Gallery on March 14, 2026 and the live gallery page records a last-updated timestamp of March 26, 2026. (sebastianraschka.com) The gallery’s source metadata lives in the rasbt/llm-architecture-gallery GitHub repository (the central file is models.yml) and the repo shows recent commits that added new model entries on March 25, 2026. (github.com) Raschka’s site aggregates architecture figures and compact fact sheets for more than 40 open-weight models — several coverage pieces list the gallery as cataloguing about 43 architectures for side-by-side comparison. (agent-wars.com) Each models.yml entry is explicitly structured with fields such as date, scale, context_tokens, config (repo + label + url) and tech_report (paper URL), illustrated by entries like DeepSeek V3 and Arcee Trinity that include Hugging Face config.json and arXiv links. (github.com) The gallery bundles short explainers for primitives like Grouped‑Query Attention (GQA), Multi‑Head Latent Attention (MLA), and hybrid-attention patterns, and Raschka links those explainers to his from‑scratch notebooks that implement Llama‑3 and Qwen3 variants for hands‑on inspection. (sebastianraschka.com) The public repo invites contributions and issue reports (there’s an open issue tracker and a changelog documenting added cards such as Nemotron and Kimi in March 2026), so architecture entries and config links are actively curated. (github.com) As of the current repository snapshot, the gallery has drawn community attention on GitHub (hundreds of stars and dozens of forks) and continues receiving updates that add recent releases and config links, making it a centralized index of real model configs, diagrams, and tech‑report pointers. (github.com)