Case Study - Uncommon Goods

How Uncommon Goods stopped spending 50% of their QA time on Selenium

A conversation between
Sneha Sivakumar
Sneha Sivakumar
CEO of Spur
Solomon Ademuwagun
QA Manager, UncommonGoods

COMPANY

UncommonGoods is an online retailer known for thoughtfully curated, design-forward products, connecting customers with unique goods from independent makers around the world

INDUSTRY

E-Commerce ($222 ARR)

COMPANY SIZE

51–200

FOUNDED

1999

50%

Reduction in release time,

after adopting Spur.

90%+

Test accuracy

achieved in weeks vs. months with Selenium.

$300K

Saved in QA costs

while optimizing the process.

The Problem

Half of every working day, gone on maintaining the infrastructure that was supposed to test the product.

UncommonGoods has been selling unique, thoughtfully curated goods from independent makers since 1999. Their e-commerce site is the product, and keeping it working reliably across checkout, browsing, and discovery flows is what QA exists to do.

But before Spur, QA at UncommonGoods wasn't really doing that. They had 150 tests built on Selenium, a DevOps dependency to keep the infrastructure running, and an offshore support arrangement just to maintain what they had. Around 50% of the QA team's time was going to maintenance, keeping the test suite alive, not running it productively.

"You're not spending 50% of your time doing maintenance anymore… you're spending maybe 1%, and boom, you run it."

The tests that did exist were brittle. Any UI change could break them. Complex releases required weeks of QA preparation and coordination just to reach a starting point. And despite all of that effort, site reliability was landing at 89–92%, below industry standards for a retailer that depends entirely on its website. Bugs were still reaching production. QA was a bottleneck without being a safety net.

The Solution

Maintenance became 1% of the job and coverage actually improved.

The shift from Selenium to Spur wasn't just a tool swap, it was a rethink of the whole testing approach. UncommonGoods consolidated 150 redundant, overlapping, brittle Selenium tests into around 30 dynamic, adaptive Spur tests. Fewer tests, covering more ground, with virtually no maintenance overhead.

What made that possible is the difference in how Spur works. Spur's agents navigate like real users, they adapt to UI changes automatically rather than breaking when a selector shifts. Writing a test is describing what you want to verify in plain language, not maintaining a fragile script. The infrastructure dependency disappeared entirely.

Within weeks, UncommonGoods reached 90%+ test accuracy, a benchmark that took months to achieve with Selenium. Regression moved from a once-per-release event to something the team could run multiple times per week with minimal overhead.

"The more you use Spur, the smarter it gets. The smarter it gets, the faster you can write tests and find bugs."

Crucial Moment

A complex release that would have taken weeks of QA was automated in 1-2 days.

This is the number Solomon comes back to most, a specific release that previously required weeks of QA preparation was handled by Spur in one to two days. That's roughly 10 business days saved on a single release. For a retailer where time to deploy directly affects revenue, that's not an operational improvement, it's a strategic one.

"Time is money, and that's the strength of Spur."

Spur also started surfacing clusters of bugs in checkout, the highest-stakes flow on any e-commerce site, that were previously reaching production. Site reliability climbed from 89–92% to 95–98%.

"That's above industry standards… a pretty good indicator of how good Spur is."

The Shift

With maintenance gone, QA became what it was always supposed to be.

The 50% of time that used to go to Selenium maintenance didn't disappear, it got redirected. With regression running reliably and automatically, Solomon's team shifted to the work that actually requires human judgment:

  • Edge case and exploratory testing, the scenarios no automated suite will think to try
  • Expanding automation coverage into new areas of the product
  • Evaluating internal tools for further automation opportunities
  • Working toward a further 25–40% reduction in manual QA
"It's allowed employees to focus on what they're really good at instead of just busy work."

The longer-term goal is catching blockers earlier, in development, not at release. That's the shift from QA as a release gate to QA as a development accelerator.

"If we can catch blockers early… that's the whole ball game."

50%

Release time reduction

90%+

Test accuracy achieved in weeks

$300K

Saved in QA costs

Critical e-commerce flows across 30+ regions
Every regional price, discount rule, and product variant automatically tested before your sale goes live, no manual spot-checking required.
Hundreds of partner landing pages
Ensuring that every audience coming from podcasts, newsletters, and other partnerships lands on a page that is on brand and error free.
Staging and production environments
Running tests in staging for high confidence before launch, then validating again on production as a final safety net.

Key Insights

UncommonGoods didn't just replace a tool. They replaced a way of working, one where half the job was keeping the test infrastructure alive, with one where tests run themselves and the team focuses on what actually takes judgment. 150 tests became 30. Maintenance became 1%. Site reliability crossed industry benchmarks. That's what happens when QA stops being a burden and starts being a system.

CUSTOMER STORIES

More teams, same results.

UncommonGoods cut QA time in half with AI-driven testing
How Uncommon Goods stopped spending 50% of their QA time on Selenium
From Manual QA Bottlenecks to Fast, Reliable Releases with Spur
How Wondr Health enabled an entire team to work on more interesting problems
Scaling shoppable UGC QA across dozens of brands by adding a single URL to a shared Spur scenario table
How Hue QAs shoppable widgets across 20+ merchants without rebuilding anything.
From manual spot checks to reliable, release-ready coverage at peak traffic
How Eight Sleep Turned Black Friday QA From All-Nighters to Automated Confidence
Regression Done by Noon, Every Release
The regression marathon that ran 9am to midnight, every two weeks
90 % Coverage in 2 Weeks
How YC Hit 90% Coverage on Its Mission‑Critical Applications Portal
2× Faster Deployments, Zero Manual Testing
August deploys every six hours. 25% of those releases had to be rolled back
20x Increase in Release Velocity
Testing Wander with traditional tools was impossible