RYS tricks: repeat mid‑layers
The RYS technique — duplicating transformer mid‑layers (example: Qwen2‑72B duplicating layers 45–51) — boosted MuSR benchmarks by +17.72% without any fine‑tuning, revealing actionable anatomy inside LLMs (x.com). That suggests structural tweaks can be as potent as data or compute increases for reasoning gains (x.com).
David Noel Ng published the follow‑up "LLM Neuroanatomy II" post on March 21, 2026 presenting the relayering experiments and the RYS workflow. (dnhkng.github.io: ) The RYS‑XLarge artifact is hosted on Hugging Face and its model card lists MaziyarPanahi/calme‑2.1‑qwen2‑72b as the base checkpoint used for the relayering variant. (huggingface.co: ) Ng describes discovering the relayering pattern using EQ‑Bench style probes and says the initial scans were run on two NVIDIA RTX 4090 GPUs. (dnhkng.github.io: ) Independent reproducibility efforts published on GitHub applied the same relayering search to other weights (including Qwen2.5‑32B and Devstral‑24B) and reported large metric shifts such as BBH rising from 0.22 to 0.76 in one replication. (github.com: ) Multiple writeups and threads characterize the effective modules as contiguous "reasoning circuits" of roughly 3–7 layers and note that duplicating single layers or overly large blocks produced little or negative change. (news.ycombinator.com: ) (lumenhunt.com: ) Ng's relayering work is credited with producing a model that reached #1 on the Hugging Face Open LLM Leaderboard in mid‑2024 and spawning several downstream descendants that remained near the top into 2026. (dnhkng.github.io: ) (news.ycombinator.com: ) A quick ecosystem response produced a YouTube explainer and multiple public toolkits and repos for automated layer searches and relayering, including Ng's own RYS repository and community toolkits. (youtube.com: ) (github.com: )