LLMs fail accessibility tests — until prompted
A developer post shows LLMs perform poorly on accessible-code benchmarks unless prompts explicitly mention accessibility — but with detailed guidance scores jump above 90% reported. Separately, AI-driven WCAG auditor prompts are circulating as toolkits for automated checks and remediations shared elsewhere, highlighting how prompt design materially changes AI audit outcomes.
FeedA11y, introduced in a 2025 arXiv study, uses a feedback-driven ReAct loop and the authors report reducing contrast-related accessibility failures by about 53% compared with baseline prompting. arxiv.org AIMAC (AI Model Accessibility Checker) is an open-source test harness described in mid‑2025 that evaluates whether LLMs produce standards-aligned HTML when given neutral, non‑accessibility-specific prompts. dubbot.com A peer‑reviewed benchmark published by Springer evaluated 11 web components using a three-stage prompt workflow (foundation, main, follow‑up) and then verified outputs via keyboard and screen‑reader testing. link.springer.com Multiple developer toolkits and prompt libraries have been posted publicly this year, including a GitHub wcag‑ai‑auditor repository for automating WCAG checks github.com and the AIA11y prompt library that packages copy/paste prompts for ChatGPT, Gemini, and Copilot. aia11y.com Commercial and marketplace entries are emerging: PromptBase lists ready‑made WCAG audit prompts, and an Apify agent advertises AI WCAG audits at approximately $0.25 per run for basic pages. promptbase.com Survey and implementation guides published in 2025–2026 recommend combining LLM‑driven checks with human review for WCAG 2.1/2.2 compliance, while research projects like A11YN argue for training models to natively emit accessible UI code to reduce reliance on iterative prompt engineering. testparty.ai