Case Study - Furniture Retailer

The regression marathon that ran 9am to midnight, every two weeks

A conversation between
Sneha Sivakumar
Sneha Sivakumar
CEO of Spur
Chloe Lu
Manager, Living Spaces

COMPANY

Large Furniture Retailer

INDUSTRY

E-Commerce ($675M ARR)

COMPANY SIZE

1001-5000

FOUNDED

2003

0>95%

Automation coverage

in less than 3 weeks.

10×

Faster deployment velocity,

replacing 25+ hours of manual regression.

More A/B tests per week,

as QA shifted to strategic experimentation.

The Problem

Every release night, one person vs. hundreds of test cases

Living Spaces is a major furniture retailer, over a thousand employees, a complex e-commerce site, and a bi-weekly release schedule. The problem was what happened in the 14 hours before each deploy.

One QA engineer. One sprawling spreadsheet. Hundreds of test cases spanning Account Management, PLP, PDP, Search, Cart, Checkout, and every desktop and mobile variation. They'd start at 9am and work through until the 11pm release, checking every row by hand.

"Pre-release days were really, really stressful. One person would go in and do progression testing from end to end on our website, trying to cover all of the user flows."

If a bug showed up mid-test, and they often did, the engineer had to stop, document it, wait for a fix, retest, and restart from wherever they left off. Midnight finishes weren't unusual. Neither was the nagging worry about what they'd missed. Three things kept breaking down:25+ hours of manual testing per release, bugs still reaching production because continuous monitoring was impossible, and the rest of the team's velocity held hostage by QA's capacity.

What Changed

From zero automation to 300+ tests running overnight

When Chloe started evaluating tools, the bar was specific. Living Spaces didn't need a framework that could check a few scripted flows. They needed something that could handle the full complexity of a major e-commerce site. Hundreds of flows, desktop and mobile, dynamic content, without requiring a team of engineers to maintain it.

Spur's AI browser agents behave like real users, they navigate flows visually, evaluate what they see, and flag things that look wrong. No code. No maintenance burden. Chloe's team wrote every test in plain English, and Spur handled execution. The whole regression spreadsheet was rebuilt as automated suites in about three weeks.

"We went from zero to 90% coverage with over 300 tests. The team wrote them in plain English. No scripts, no maintenance burden."

What it Unlocked

When QA stops being a bottleneck, everything else speeds up

The most immediate result was time: 25+ hours of manual regression per release, gone. But the more interesting result is what the team started doing with that time. With regression automated, QA shifted from button-clicking to strategic work the team had always wanted to do but couldn't.

The 5× increase in A/B tests per week is the number that tells that story most clearly. It's not just that QA got faster, it's that work that wasn't getting done before is now getting done. Experiments that would have waited weeks are shipping in days.

Spur's AI browser agents behave like real users, they navigate flows visually, evaluate what they see, and flag things that look wrong. Here's an example of exactly that: a product gets added to a wishlist, the user navigates to the wishlist page, and it's empty. The kind of bug that's easy to miss in a manual checklist, and embarrassing to have a customer find. This required no code and no maintenance burden. Chloe's team wrote every test in plain English.

"Spur is our first big win company-wide in terms of implementing the use of AI agents. When we were able to share this with our greater team, everybody was almost in awe of what we were able to achieve."

That word "awe" is what sticks. This wasn't a marginal improvement on a process. It was a before-and-after moment for the entire company, the kind that makes people stop and think about what else AI agents could be doing for them.

95%

Automation coverage reached in 3 weeks

10x

Faster deployment velocity

5x

More A/B tests per week

Critical e-commerce flows across 30+ regions
Every regional price, discount rule, and product variant automatically tested before your sale goes live, no manual spot-checking required.
Hundreds of partner landing pages
Ensuring that every audience coming from podcasts, newsletters, and other partnerships lands on a page that is on brand and error free.
Staging and production environments
Running tests in staging for high confidence before launch, then validating again on production as a final safety net.

Key Insights

Every two weeks, one person vs. hundreds of test cases. 9am to midnight, hoping nothing slipped through. Now the tests run while everyone sleeps, the results are waiting at 9am, and the job looks completely different. And so does the team's ambition.

CUSTOMER STORIES

More teams, same results.

UncommonGoods cut QA time in half with AI-driven testing
How Uncommon Goods stopped spending 50% of their QA time on Selenium
From Manual QA Bottlenecks to Fast, Reliable Releases with Spur
How Wondr Health enabled an entire team to work on more interesting problems
Scaling shoppable UGC QA across dozens of brands by adding a single URL to a shared Spur scenario table
How Hue QAs shoppable widgets across 20+ merchants without rebuilding anything.
From manual spot checks to reliable, release-ready coverage at peak traffic
How Eight Sleep Turned Black Friday QA From All-Nighters to Automated Confidence
Regression Done by Noon, Every Release
The regression marathon that ran 9am to midnight, every two weeks
90 % Coverage in 2 Weeks
How YC Hit 90% Coverage on Its Mission‑Critical Applications Portal
2× Faster Deployments, Zero Manual Testing
August deploys every six hours. 25% of those releases had to be rolled back
20x Increase in Release Velocity
Testing Wander with traditional tools was impossible