A/B Test Setup

Design rigorous A/B tests, calculate sample sizes, and validate results with statistical confidence.

What This Skill Does

The Challenge: Most A/B tests fail due to underpowered samples, premature stopping, or poorly defined hypotheses. Teams waste months on inconclusive tests that produce no actionable insight.

The Solution: A/B Test Setup skill provides structured experiment design with hypothesis frameworks, statistical power calculators, segmentation strategies, and result interpretation guidelines. Covers landing pages, emails, ads, and in-product experiments.

Activation

Implicit: Activates when user mentions “A/B test”, “split test”, “experiment”, or “test variant”.

Explicit: Activate via prompt:

Activate ab-test-setup skill to design an experiment for [describe goal]

Capabilities

1. Hypothesis Framework

Structure testable hypotheses with clear success metrics.

Hypothesis template:

We believe [changing X] for [user segment]
will result in [metric change] because [reason].
We'll know this is true when [measurable outcome].

2. Sample Size Calculator

Determine minimum detectable effect and required traffic.

Key inputs:

Parameter	Typical Value	Notes
Baseline CVR	Current rate	From analytics
MDE	5-20% relative	Minimum meaningful lift
Statistical power	80%	Probability of detecting true effect
Significance level	95% (p<0.05)	False positive tolerance

3. Test Design Patterns

Pre-built frameworks for common marketing experiments.

Experiment types:

Landing page: Hero copy, CTA color, form length, social proof placement
Email: Subject line, send time, CTA text, personalization
Ad creative: Headline, image, audience segment, bid strategy
Pricing page: Price anchoring, plan names, feature ordering

4. Results Interpretation

Analyze outcomes with statistical rigor.

Checklist before calling a winner:

Reached minimum sample size
Run for at least 2 full business cycles
No external events skewing data
Segment analysis completed (no Simpson’s paradox)

Prerequisites

Access to analytics platform (GA4, Mixpanel, Amplitude)
A/B testing tool (Optimizely, VWO, LaunchDarkly, or custom)
Defined baseline conversion rate
Estimated weekly traffic volume

Best Practices

1. One variable at a time Multivariate tests require 4x the traffic. Start simple.

2. Never stop early Peeking at results inflates false positives. Commit to sample size upfront.

3. Document everything Record hypothesis, dates, traffic split, and results. Build institutional memory.

Common Use Cases

Use Case 1: Landing Page CTA Test

Scenario: Test “Start Free Trial” vs “Get Started Free” on homepage.

Workflow:

Define baseline CVR (e.g., 3.2%)
Set MDE to 15% relative lift → target 3.68%
Calculate sample: ~4,800 visitors per variant
Run for 3 weeks minimum
Analyze with segmentation (device, source, new vs returning)

Use Case 2: Email Subject Line Test

Scenario: Personalized vs generic subject line for onboarding sequence.

Workflow:

Split list 50/50 by cohort (not random per send)
Track 48-hour open rate and 7-day conversion
Run on 3 consecutive sends before declaring winner

Troubleshooting

Issue: Test shows significant result but no business impact Solution: Check if metric tested (clicks) correlates with revenue. Move measurement downstream.

Issue: Results flip week over week Solution: Extend test duration. Segment by day-of-week to detect cyclical behavior.

Analytics - Baseline metrics and tracking
Form CRO - Conversion rate optimization
Marketing Psychology - Behavioral science for hypotheses
Copywriting - Copy variants for tests

/ckm:analyze - Analyze test results
/ckm:plan - Plan experiment roadmap