A/B testing and experiments

Create and run A/B tests natively in Ordinary — variants, traffic split, page and audience targeting — scored on your real reconciled revenue with peek-safe statistics. Tests from third-party tools are picked up automatically too.

Ordinary Written by The Ordinary Team · Updated

A/B testing and experiments

Run A/B tests directly in Ordinary and score every variant on the real orders your store actually shipped — not clicks, not sessions, not a vanity uptick. Set a test up in a couple of minutes; Ordinary handles the traffic split, the statistics, and the revenue attribution.

Left nav → Experiments.

Create a test in Ordinary

Click New experiment and fill in:

  • Name + hypothesis — what you’re changing and what you expect.
  • Variants — add 2–10 variants and set the traffic split (50/50, 80/20, whatever you need). One is your control.
  • Where it runs — a URL pattern for the page(s) under test. Use a trailing * to match a whole section (e.g. /products/*), or leave it blank to run site-wide.
  • Who’s in it (optional) — limit the test to a traffic channel, so you can test only paid-Meta visitors, or only shoppers who arrived from a Klaviyo email. Everyone else sees your normal site.
  • Test typesame-page (variants swap content on the same URL) or multi-page (each variant has its own URL and Ordinary redirects).
  • Primary metric — revenue per visitor (default), conversion rate, or average order value.
  • Guardrails (optional) — secondary metrics checked alongside the primary so you catch trade-offs: refund rate, bounce rate, add-to-cart rate, 30-day repeat purchase.

Save it as a draft, then click Start when you’re ready. Ordinary serves the split through your store — no separate testing tool, no DNS or tag-manager setup. Bucketing is sticky (a shopper always sees the same variant), and results start flowing within minutes.

Already using another testing tool?

You don’t have to move your tests into Ordinary to get Ordinary’s scoring. If your store already runs tests through a Shopify app (Shoplift, Intelligems, VWO, and the like), a theme that sets a data-ab-bucket attribute on <html>, or your own window.Ordinary.experiment({ id, variant }) hook, Ordinary recognises those automatically and scores them with the exact same revenue and statistics as the tests you build here. Run a mix if you like — they all land in the same Experiments list.

What the dashboard shows

The Experiments list shows every test, with a headline metric and a plain-English verdict. Clicking into a test opens the detail view:

  • Per-arm results for the visitors who saw each variant — orders, conversion rate, average order value, revenue per visitor.
  • Lift — how the variant performs against the control, with a plain-English “range of likely results” (the statistical confidence interval, named so it’s actually readable).
  • Peek-safe significance — a sequential-testing method that lets you check results early without inflating the false-positive rate the way classic significance tests do.
  • Faster conclusions — for revenue tests with enough data, Ordinary applies a variance-reduction technique (CUPED) that can shorten the time to call a winner by 20-30%. The display just shows tighter numbers; the math is in the About the math expander.
  • Allocation check — if your split looks suspicious (you set 50/50 but actual exposure is 65/35), the dashboard flags it so you don’t trust biased results.
  • Funnel by arm — sessions → add-to-cart → checkout → orders, broken out per variant.
  • Revenue per visitor over time — a line per variant; drift can hint at allocation problems or seasonality.

Every number uses the same reconciled revenue and attribution as the rest of your reports — every order counted is a real, paid Shopify order.

How to interpret “Too early”

Ordinary won’t pretend a test is significant when it isn’t. The dashboard shows “Too early” until at least 1,000 visitors have been exposed and the statistical confidence threshold is met. There’s no way to dismiss this — the math is the math.

To plan a test before running it, use the public sample size calculator: plug in your baseline conversion rate and the smallest lift you’d want to detect, and it tells you how many visitors per variant you need.

Settings and defaults

Open Settings → Experiments for org-wide preferences:

  • Default primary metric — usually revenue per visitor; some teams prefer conversion rate or AOV.
  • Default guardrails — secondary metrics checked alongside the primary so you catch trade-offs (a variant might “win” on conversion rate but lose on refund rate, which is a wash).
  • Conversion event — by default Ordinary counts checkout_completed. Switch to checkout_started or add_to_cart for upper-funnel tests.
  • Bridge attribute names — only relevant if a third-party theme test uses non-standard attribute names.

You can also override defaults per-test from the About this experiment card on the test detail page.

Experiment data is captured via the same pixel events that power the rest of Ordinary’s analytics. If your storefront uses a cookie banner integrated with Shopify’s Customer Privacy framework — common for EU/UK/EEA-facing stores — shoppers who decline cookies generate no pixel events, so they’re invisible to the experiments platform.

In practice:

  • Sample sizes for tests with strict-region traffic will be lower than your overall visitor count suggests. Plan accordingly.
  • The shoppers in your results are the accept-cookies subset. People who decline might behave differently — a technical selection bias, but every analytics platform on Shopify has the same limitation, and the framework is what Shopify requires for compliance.
  • Non-strict-region traffic (US, Canada, Australia, most of Asia and South America outside specific markets) sees no banner by default and produces a full event stream.

If a test needs a specific sample size, factor in your accept-cookies rate. A test with 40% EU traffic and a 60% accept rate captures roughly 76% of your visitors — plan for ~30% more traffic, or run it ~30% longer.

See How Ordinary handles your visitor and customer data for the full posture.

Split your other reports by variant

While a test runs, Ordinary can split your existing reports by variant. Look for the Experiment dropdown in the page actions on Orders, Customers, and the GMV chart — selecting a test breaks every row out by which variant the visitor saw. Handy for spot-checking that the per-variant aggregates agree with live order data.

Frequently asked

Q: My test isn’t collecting data. Why not?

A: If it’s a test you built in Ordinary, make sure you clicked Start (drafts don’t run) and that real traffic is hitting the URL pattern you set — give it ~10 minutes after the first shopper lands. If it’s a test from a third-party tool or your own code, the most common cause is a framework that doesn’t set the data-ab-bucket attribute or call window.Ordinary.experiment(...).

Q: I changed my test variants mid-flight. What happens?

A: Ordinary flags this as an assignment conflict on the affected visitor records and surfaces a warning if the conflict rate is non-zero. Best practice is to end the original test and start a fresh one rather than reshuffling buckets — but Ordinary won’t quietly mix the data.

Q: How long does Ordinary count an order as belonging to a test?

A: 14 days from when a shopper first saw a variant. A purchase 18 days after exposure falls outside the window and isn’t attributed to the test. 14 days is the industry-standard experiment attribution window.

Q: Can I export per-visitor exposure data?

A: Yes — admins can open the Raw exposure log from each test’s detail page. The export includes the visitor’s anonymous identifier, the variant they saw, when they first saw it, and which event triggered the exposure.

Did this answer your question?

Thanks for your feedback! 🙌

Related articles