Models

How Levered's multi-armed bandit (MAB) and contextual multi-armed bandit (CMAB) models learn to select optimal variants.

Every optimization has a model that learns which variants produce the best outcomes. Levered supports two model types: MAB (multi-armed bandit) and CMAB (contextual multi-armed bandit).

Multi-Armed Bandit (MAB)

A MAB model learns a single reward distribution for each variant, without considering who the user is. It answers the question: across all users, which variant performs best?

Use a MAB when:

All users should see the same optimal variant.
You do not have strong reason to believe the best variant differs by user segment.
You have limited traffic and want the model to converge quickly.
You are optimizing a simple decision with a small number of variants.

Because a MAB treats every user identically, it needs less data to produce confident estimates. If you have 6 variants and a few hundred observations, a MAB can start making meaningful allocation decisions.

Contextual Multi-Armed Bandit (CMAB)

A CMAB model incorporates context factors -- user attributes like country, device type, or traffic source. It learns conditional reward distributions and can personalize: which variant works best for this specific type of user?

Use a CMAB when:

You believe the optimal variant differs across user segments.
You have enough traffic to support learning across segments.
Personalization is a goal, not just finding a single winner.

A CMAB with country (4 levels) and device_type (3 levels) is effectively learning across 12 user segments, each needing sufficient data. The payoff is more precise targeting, but the cost is slower convergence.

Example: A CMAB might discover that mobile users from Germany convert best with a short headline and a prominent "Get started" button, while desktop users from the US prefer a longer headline with social proof. A MAB would pick whichever variant performs best on average, missing these segment-level differences.

How training works

Both model types use Bayesian inference to learn from observations. Rather than producing a single "this variant has a 12% conversion rate" point estimate, the model maintains a full probability distribution over each variant's reward rate.

The training process:

Collect observations. Levered queries your warehouse and builds the joined exposure-reward dataset.
Run inference. The model uses MCMC (Markov Chain Monte Carlo) sampling with Gibbs sampling to estimate posterior distributions over the reward parameters.
Produce posteriors. The output is a distribution for each variant (MAB) or each variant-context combination (CMAB) that represents the model's belief about the true reward rate, given the data seen so far.

Why distributions matter: a variant with 5 conversions out of 10 exposures (50%) is not the same as one with 500 out of 1,000 (50%). The first has high uncertainty; the second is well-established. The posterior distribution captures this difference, and the serving algorithm uses it.

Thompson Sampling

Thompson Sampling is how Levered decides which variant to serve to each user. It naturally balances exploration (trying uncertain variants to learn more) and exploitation (using variants known to perform well).

The algorithm for each request:

For each variant, draw a random sample from its posterior distribution.
Pick the variant whose sample is highest.
Serve that variant to the user.

This simple process has powerful properties:

Variants with high expected reward get served more often, because their distributions are centered on higher values.
Uncertain variants get explored, because wide distributions occasionally produce high samples, even if the mean is mediocre.
As confidence grows, exploitation dominates. Once the model has seen enough data, the posteriors tighten and the best variant wins almost every time.
If conditions change, exploration resumes. A previously-losing variant that starts performing better will get picked up as new data shifts its distribution.

For CMAB models, Thompson Sampling works the same way but uses context-specific posteriors. The model draws from the distributions conditioned on the current user's context, so different users may receive different variants.

Model lifecycle

A model progresses through these states:

State	Description
`created`	The model exists but has not been trained yet. No observations are available or training has not been triggered.
`training`	Inference is running. The model is processing observations and fitting posterior distributions.
`trained`	Training is complete. The model has posterior distributions and can serve variants via Thompson Sampling.

After reaching trained, the model can be retrained as more data arrives in your warehouse. Retraining incorporates the latest observations and updates the posterior distributions. The model transitions back through training to trained with each cycle.

While a model is retraining, the previous trained version continues serving variants. There is no downtime.

In-sample metrics

After training, Levered reports metrics that help you understand how the optimization is performing:

Expected reward per variant -- the posterior mean reward rate for each variant (or variant-context combination for CMAB).
Lift -- the expected improvement of the best variant over a baseline (typically the worst-performing or a control variant).
Confidence intervals -- credible intervals around the lift estimate, derived from the posterior distributions.

These metrics are Bayesian, not frequentist. A "95% credible interval" means there is a 95% probability that the true value falls within the interval, given the observed data and prior. This is a direct probability statement, unlike frequentist confidence intervals.

Choosing between MAB and CMAB

Consideration	MAB	CMAB
Personalization	No -- one best variant for everyone	Yes -- best variant per user segment
Data requirements	Lower -- converges with fewer observations	Higher -- needs data across all context segments
Setup complexity	Simpler -- just define design factors	More involved -- also define context factors and levels
Best for	Simple decisions, low traffic, quick wins	High-traffic experiences where user segments behave differently

When in doubt, start with a MAB. If the results suggest room for personalization (e.g., you notice different user segments responding differently in your analytics), switch to a CMAB and add context factors.