Case Study - Eight Sleep

How Eight Sleep Turned Black Friday QA From All-Nighters to Automated Confidence

A conversation between
Sneha Sivakumar
Sneha Sivakumar
CEO of Spur
Alanah Anderson
Product Manager, Eight Sleep

COMPANY

Eight Sleep is an American company that develops products for "sleep fitness"

INDUSTRY

E-commerce ($1.5 B)

COMPANY SIZE

51-200

FOUNDED

2014

20x

Release velocity.

"Spur has allowed us to iterate and ship much faster."

95%

Manual QA time eliminated

before Black Friday peak.

1.5

Weeks to reach

full automated coverage before Black Friday.

The Problem

"There was always this underlying anxiety that something would break. To the best of my ability everything was covered, but you always worry there is an edge case you did not think about."

Every year, Black Friday meant one giant spreadsheet and a lot of crossed fingers

Eight Sleep sells premium sleep technology, their Pod covers, mattresses, and temperature systems, to people in more than 30 countries. That's a lot of regions, a lot of currencies, a lot of partner landing pages, and a lot of things that can quietly break right before the biggest sale of the year.

Alanah Anderson runs e-commerce product at Eight Sleep. And every year, in the weeks before Black Friday, she'd find herself alone in a room with a spreadsheet the size of a small novel. Tabs for every region. Rows for every product variant and discount. Columns for partner pages from podcast sponsors, newsletter deals, and influencer campaigns.

She and one other person would go through it manually. Every row. Every page. Every region.

They had some automated tests for core flows, but anything visual or qualitative, does this discount look right? does this partner page feel on-brand? does the German checkout show the right currency? That was all Alanah, in that room, checking manually.

That anxiety is real. A broken discount in Germany during Black Friday isn't a small bug, it's lost revenue, a bad customer experience, and a support team scrambling at the worst possible time. And no spreadsheet, no matter how detailed, completely takes that feeling away.

Why Spur

She'd looked at other tools. None of them felt like they could actually replace a human.

Before Spur, Alanah evaluated the usual options: playwright-style setups, traditional automation frameworks, a few QA platforms. Most of them are good at one thing: asserting that event A leads to event B. Click this button, expect this response.

But that's not the kind of QA Eight Sleep needed. The questions they were trying to answer were more like:does this page feel right? does the pricing look correct for this region and this variant? does this podcast partner's landing page actually match what they promised their audience?Those aren't event assertions. Those are judgment calls.

Most solutions also needed heavy engineering involvement to get off the ground. That was a dealbreaker. Alanah's team is product and e-commerce, not engineers. She needed something she could actually own.

Spur was different for three reasons. First, it behaves like a user, it looks at a page the way a human does, not just asserting events but visually evaluating what it sees. Second, it's built for the messy reality of e-commerce: multiple regions, currencies, product variants, hundreds of partner pages. That's exactly where traditional tools fall apart. Third, and this was big, product could own the onboarding. No engineering lift to get started.

"The other tools I looked at did not feel like they could replace human QA. With Spur, I realized it was actually possible to solve those problems."

How it Happened

They went from spreadsheet to full automated coverage in a week and a half. No engineers required.

Here's what's remarkable: Eight Sleep onboarded Spur just weeks before Black Friday, arguably the worst possible time to be adopting new tooling. And they still made it work, because Spur's team was hands-on from day one, helping set up tests even before Eight Sleep had fully logged in.

[timeline]

WEEK 1 | Turning the spreadsheet into a knowledge base | Everything that lived in Alanah's tabs, regions and their rules, discount configurations, personas like "first-time visitor" and "podcast partner referral" moved into Spur as structured test context.

WEEK 1-2 | Building test suites across staging and production | Tests ran first on staging. By the time feature flags flipped for the actual sale, Eight Sleep already had high confidence from staging results. No starting from scratch on launch day.

LAUNCH | Black Friday, finally, without the anxiety | Once the sale went live, Spur triple-checked across live environments as a final safety net. Engineering only got involved to fix issues, not to set anything up.

[/timeline]

Bug found on End-to-end checkout test on spur during regular regression testing related to a pricing discrepancy.

This is what Spur actually catches, a checkout flow that looks fine until it doesn't. A price that shows correctly on the checkout but not when on the main page. The kind of  case that used to live in Alanah's spreadsheet as a worried question mark, now flagged automatically before anyone has to stay late.

"It now feels like running the tests is just part of the process. I already have high confidence from staging before we even set things live."

The Results

Last year: five out of ten confidence.
This year: ten out of ten.

That's the line that sticks with us. Not "we reduced manual QA time by 95%," though they did. Not "we now cover 30+ regions automatically," though they do. It's the shift fromI hope nothing breakstoI know it won't.

Today, Eight Sleep runs Spur on a regular cadence. Even during code freezes, when no one on Eight Sleep's team has pushed anything, Spur keeps running, so unexpected changes from third-party tools or partners get caught immediately rather than surfacing on launch day.

That's the whole story, really. Not a product demo, not a feature list, just a product manager who used to lose sleep over Black Friday, and now doesn't.

95%

Manual QA time eliminated before BF

30+

Countries and regions covered by Spur tests

1.5

Weeks to reach full automation before BF

Critical e-commerce flows across 30+ regions
Every regional price, discount rule, and product variant automatically tested before your sale goes live, no manual spot-checking required.
Hundreds of partner landing pages
Ensuring that every audience coming from podcasts, newsletters, and other partnerships lands on a page that is on brand and error free.
Staging and production environments
Running tests in staging for high confidence before launch, then validating again on production as a final safety net.

Key Insights

Eight Sleep went into their biggest sale period with 10/10 confidence instead of hoping everything would hold. After onboarding in just 1.5 weeks, entirely without engineering involvement.

CUSTOMER STORIES

More teams, same results.

UncommonGoods cut QA time in half with AI-driven testing
How Uncommon Goods stopped spending 50% of their QA time on Selenium
From Manual QA Bottlenecks to Fast, Reliable Releases with Spur
How Wondr Health enabled an entire team to work on more interesting problems
Scaling shoppable UGC QA across dozens of brands by adding a single URL to a shared Spur scenario table
How Hue QAs shoppable widgets across 20+ merchants without rebuilding anything.
From manual spot checks to reliable, release-ready coverage at peak traffic
How Eight Sleep Turned Black Friday QA From All-Nighters to Automated Confidence
Regression Done by Noon, Every Release
The regression marathon that ran 9am to midnight, every two weeks
90 % Coverage in 2 Weeks
How YC Hit 90% Coverage on Its Mission‑Critical Applications Portal
2× Faster Deployments, Zero Manual Testing
August deploys every six hours. 25% of those releases had to be rolled back
20x Increase in Release Velocity
Testing Wander with traditional tools was impossible