# A/B testing and experiments

> Create and run A/B tests natively in Ordinary — variants, traffic split, page and audience targeting — scored on your real reconciled revenue with peek-safe statistics. Tests from third-party tools are picked up automatically too.

Source: https://help.tryordinary.com/features/experiments

---

Run A/B tests directly in Ordinary and score every variant on the real
orders your store actually shipped — not clicks, not sessions, not a
vanity uptick. Set a test up in a couple of minutes; Ordinary handles
the traffic split, the statistics, and the revenue attribution.

Left nav → **Experiments**.

## Create a test in Ordinary

Click **New experiment** and fill in:

- **Name + hypothesis** — what you're changing and what you expect.
- **Variants** — add 2–10 variants and set the traffic split (50/50,
  80/20, whatever you need). One is your control.
- **Where it runs** — a URL pattern for the page(s) under test. Use a
  trailing `*` to match a whole section (e.g. `/products/*`), or leave
  it blank to run site-wide.
- **Who's in it** *(optional)* — limit the test to a traffic channel, so
  you can test only paid-Meta visitors, or only shoppers who arrived
  from a Klaviyo email. Everyone else sees your normal site.
- **Test type** — *same-page* (variants swap content on the same URL)
  or *multi-page* (each variant has its own URL and Ordinary redirects).
- **Primary metric** — revenue per visitor (default), conversion rate,
  or average order value.
- **Guardrails** *(optional)* — secondary metrics checked alongside the
  primary so you catch trade-offs: refund rate, bounce rate, add-to-cart
  rate, 30-day repeat purchase.

Save it as a **draft**, then click **Start** when you're ready. Ordinary
serves the split through your store — no separate testing tool, no DNS
or tag-manager setup. Bucketing is sticky (a shopper always sees the
same variant), and results start flowing within minutes.

## Already using another testing tool?

You don't have to move your tests into Ordinary to get Ordinary's
scoring. If your store already runs tests through a Shopify app
(Shoplift, Intelligems, VWO, and the like), a theme that sets a
`data-ab-bucket` attribute on `<html>`, or your own
`window.Ordinary.experiment({ id, variant })` hook, Ordinary recognises
those automatically and scores them with the exact same revenue and
statistics as the tests you build here. Run a mix if you like — they all
land in the same **Experiments** list.

## What the dashboard shows

The **Experiments** list shows every test, with a headline metric and a
plain-English verdict. Clicking into a test opens the detail view:

- **Per-arm results** for the visitors who saw each variant — orders,
  conversion rate, average order value, revenue per visitor.
- **Lift** — how the variant performs against the control, with a
  plain-English "range of likely results" (the statistical confidence
  interval, named so it's actually readable).
- **Peek-safe significance** — a sequential-testing method that lets you
  check results early without inflating the false-positive rate the way
  classic significance tests do.
- **Faster conclusions** — for revenue tests with enough data, Ordinary
  applies a variance-reduction technique (CUPED) that can shorten the
  time to call a winner by 20-30%. The display just shows tighter
  numbers; the math is in the *About the math* expander.
- **Allocation check** — if your split looks suspicious (you set 50/50
  but actual exposure is 65/35), the dashboard flags it so you don't
  trust biased results.
- **Funnel by arm** — sessions → add-to-cart → checkout → orders, broken
  out per variant.
- **Revenue per visitor over time** — a line per variant; drift can hint
  at allocation problems or seasonality.

Every number uses the same reconciled revenue and attribution as the
rest of your reports — every order counted is a real, paid Shopify order.

## How to interpret "Too early"

Ordinary won't pretend a test is significant when it isn't. The
dashboard shows "Too early" until at least 1,000 visitors have been
exposed and the statistical confidence threshold is met. There's no way
to dismiss this — the math is the math.

To plan a test before running it, use the public [sample size
calculator](/experiments/calculator): plug in your baseline conversion
rate and the smallest lift you'd want to detect, and it tells you how
many visitors per variant you need.

## Settings and defaults

Open **Settings → Experiments** for org-wide preferences:

- **Default primary metric** — usually revenue per visitor; some teams
  prefer conversion rate or AOV.
- **Default guardrails** — secondary metrics checked alongside the
  primary so you catch trade-offs (a variant might "win" on conversion
  rate but lose on refund rate, which is a wash).
- **Conversion event** — by default Ordinary counts `checkout_completed`.
  Switch to `checkout_started` or `add_to_cart` for upper-funnel tests.
- **Bridge attribute names** — only relevant if a *third-party* theme
  test uses non-standard attribute names.

You can also override defaults per-test from the **About this
experiment** card on the test detail page.

## Data coverage and cookie consent

Experiment data is captured via the same pixel events that power the
rest of Ordinary's analytics. If your storefront uses a cookie banner
integrated with Shopify's Customer Privacy framework — common for
EU/UK/EEA-facing stores — shoppers who decline cookies generate no pixel
events, so they're invisible to the experiments platform.

In practice:

- **Sample sizes for tests with strict-region traffic will be lower**
  than your overall visitor count suggests. Plan accordingly.
- **The shoppers in your results are the accept-cookies subset.** People
  who decline might behave differently — a technical selection bias, but
  every analytics platform on Shopify has the same limitation, and the
  framework is what Shopify requires for compliance.
- **Non-strict-region traffic** (US, Canada, Australia, most of Asia and
  South America outside specific markets) sees no banner by default and
  produces a full event stream.

If a test needs a specific sample size, factor in your accept-cookies
rate. A test with 40% EU traffic and a 60% accept rate captures roughly
76% of your visitors — plan for ~30% more traffic, or run it ~30% longer.

See [How Ordinary handles your visitor and customer
data](https://help.tryordinary.com/concepts/data-handling-and-privacy) for the full posture.

## Split your other reports by variant

While a test runs, Ordinary can split your existing reports by variant.
Look for the **Experiment** dropdown in the page actions on **Orders**,
**Customers**, and the GMV chart — selecting a test breaks every row out
by which variant the visitor saw. Handy for spot-checking that the
per-variant aggregates agree with live order data.

## Frequently asked

**Q: My test isn't collecting data. Why not?**

A: If it's a test you built in Ordinary, make sure you clicked **Start**
(drafts don't run) and that real traffic is hitting the URL pattern you
set — give it ~10 minutes after the first shopper lands. If it's a test
from a third-party tool or your own code, the most common cause is a
framework that doesn't set the `data-ab-bucket` attribute or call
`window.Ordinary.experiment(...)`.

**Q: I changed my test variants mid-flight. What happens?**

A: Ordinary flags this as an *assignment conflict* on the affected
visitor records and surfaces a warning if the conflict rate is non-zero.
Best practice is to end the original test and start a fresh one rather
than reshuffling buckets — but Ordinary won't quietly mix the data.

**Q: How long does Ordinary count an order as belonging to a test?**

A: 14 days from when a shopper first saw a variant. A purchase 18 days
after exposure falls outside the window and isn't attributed to the
test. 14 days is the industry-standard experiment attribution window.

**Q: Can I export per-visitor exposure data?**

A: Yes — admins can open the **Raw exposure log** from each test's detail
page. The export includes the visitor's anonymous identifier, the variant
they saw, when they first saw it, and which event triggered the exposure.