
Case Study - Furniture Retailer
The regression marathon that ran 9am to midnight, every two weeks

COMPANY
Large Furniture Retailer
INDUSTRY
E-Commerce ($675M ARR)
COMPANY SIZE
1001-5000
FOUNDED
2003
0>95%

Automation coverage
in less than 3 weeks.
10×

Faster deployment velocity,
replacing 25+ hours of manual regression.
5×

More A/B tests per week,
as QA shifted to strategic experimentation.
The Problem
Every release night, one person vs. hundreds of test cases
Living Spaces is a major furniture retailer, over a thousand employees, a complex e-commerce site, and a bi-weekly release schedule. The problem was what happened in the 14 hours before each deploy.
One QA engineer. One sprawling spreadsheet. Hundreds of test cases spanning Account Management, PLP, PDP, Search, Cart, Checkout, and every desktop and mobile variation. They'd start at 9am and work through until the 11pm release, checking every row by hand.
"Pre-release days were really, really stressful. One person would go in and do progression testing from end to end on our website, trying to cover all of the user flows."

If a bug showed up mid-test, and they often did, the engineer had to stop, document it, wait for a fix, retest, and restart from wherever they left off. Midnight finishes weren't unusual. Neither was the nagging worry about what they'd missed. Three things kept breaking down:25+ hours of manual testing per release, bugs still reaching production because continuous monitoring was impossible, and the rest of the team's velocity held hostage by QA's capacity.
What Changed
From zero automation to 300+ tests running overnight
When Chloe started evaluating tools, the bar was specific. Living Spaces didn't need a framework that could check a few scripted flows. They needed something that could handle the full complexity of a major e-commerce site. Hundreds of flows, desktop and mobile, dynamic content, without requiring a team of engineers to maintain it.

Spur's AI browser agents behave like real users, they navigate flows visually, evaluate what they see, and flag things that look wrong. No code. No maintenance burden. Chloe's team wrote every test in plain English, and Spur handled execution. The whole regression spreadsheet was rebuilt as automated suites in about three weeks.
"We went from zero to 90% coverage with over 300 tests. The team wrote them in plain English. No scripts, no maintenance burden."

What it Unlocked
When QA stops being a bottleneck, everything else speeds up
The most immediate result was time: 25+ hours of manual regression per release, gone. But the more interesting result is what the team started doing with that time. With regression automated, QA shifted from button-clicking to strategic work the team had always wanted to do but couldn't.

The 5× increase in A/B tests per week is the number that tells that story most clearly. It's not just that QA got faster, it's that work that wasn't getting done before is now getting done. Experiments that would have waited weeks are shipping in days.
Spur's AI browser agents behave like real users, they navigate flows visually, evaluate what they see, and flag things that look wrong. Here's an example of exactly that: a product gets added to a wishlist, the user navigates to the wishlist page, and it's empty. The kind of bug that's easy to miss in a manual checklist, and embarrassing to have a customer find. This required no code and no maintenance burden. Chloe's team wrote every test in plain English.

"Spur is our first big win company-wide in terms of implementing the use of AI agents. When we were able to share this with our greater team, everybody was almost in awe of what we were able to achieve."
That word "awe" is what sticks. This wasn't a marginal improvement on a process. It was a before-and-after moment for the entire company, the kind that makes people stop and think about what else AI agents could be doing for them.

95%
Automation coverage reached in 3 weeks

10x
Faster deployment velocity

5x
More A/B tests per week



Key Insights
Every two weeks, one person vs. hundreds of test cases. 9am to midnight, hoping nothing slipped through. Now the tests run while everyone sleeps, the results are waiting at 9am, and the job looks completely different. And so does the team's ambition.





















