AI agents for flaky tests

New testing demos show Marketrix-style AI agents simulating users to reproduce flaky Selenium tests and suggest fixes, while other posts push 'self‑healing' practices that reduce maintenance overhead. The combination — agents that rerun flows plus heuristics that auto‑repair brittle selectors — aims to cut the time QA spends chasing intermittent failures. (x.com) (x.com)

A flaky test is a browser test that passes on one run and fails on the next run against the same code, which is why teams lose hours rerunning pipelines just to find out whether a failure is real. Selenium’s own docs say bad waiting strategy is one of the primary causes, because the page can keep changing after the browser says navigation is “complete.” (selenium.dev) Selenium works by finding a page element, storing a handle to it, and then clicking or reading it later. If the page redraws that button in between, Selenium can throw a “stale element reference” error because the old handle points to something that no longer exists in the Document Object Model, which is the browser’s internal page tree. (selenium.dev) (developer.mozilla.org) That is why so many flaky failures look fake to humans. A checkout button can still be visible on screen, but the test is holding yesterday’s map pin for a page that quietly re-rendered half a second ago. (developer.mozilla.org) (selenium.dev) The other weak spot is the locator, which is the clue a test uses to find an element in the first place. If a developer renames a class, wraps a button in a new container, or changes the order of similar elements, a brittle locator can break even when the feature still works for real users. (selenium.dev) (browserstack.com) The new demos are built around an artificial intelligence agent that behaves less like a static script and more like a patient tester. Playwright, another browser automation framework, now documents a “healer” agent that replays the failing steps, inspects the current user interface, suggests a patch such as a locator update or wait adjustment, and reruns the test until it passes or hits guardrails. (playwright.dev) That matters because the agent is not guessing from a stack trace alone. It reruns the exact flow inside a real browser session, watches where the page changed, and then proposes a concrete repair instead of handing a human a red build and a vague error message. (playwright.dev) A separate idea in the same wave is “self-healing,” which means the test framework tries a better locator when the original one fails. BrowserStack’s Selenium feature says it can detect a broken locator, find the right element using historical context and artificial intelligence signals, recover the test, and produce a healed selector report for later cleanup. (browserstack.com) Put those two pieces together and you get a two-step repair loop. The agent handles the big question of what the user was trying to do in the flow, and the self-healing layer handles the smaller question of which exact button, field, or link now matches that intent on the page. (playwright.dev) (browserstack.com) This does not make flaky tests disappear, because some failures come from real race conditions, bad test data, or product bugs that no locator patch should hide. Even BrowserStack frames self-healing around locator changes and NoSuchElement failures, not as a cure for every intermittent failure in a test suite. (browserstack.com) (semaphore.io) So the shift here is not “artificial intelligence writes tests now.” The shift is that browser testing tools are starting to do the first round of detective work themselves: replay the failure, inspect the live page, swap in a sturdier locator, suggest a code patch, and hand the engineer a smaller, more specific problem than “it failed again overnight.” (playwright.dev) (browserstack.com)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.