Percentage-Based Rollout with Sticky Bucketing

Q: A user reports seeing the new variant sometimes and the old one other times — how do I debug?

Log the resolved variant and the targetingKey on every flag call. If the key is consistent the bucket is consistent, so a flip means the targetingKey is actually changing between requests — compare the logged keys to confirm.

Q: How do I run a holdout group that permanently stays in control?

Reserve a bucket range for holdout by nesting a fractionalEvaluation that assigns off unconditionally to buckets 0 through 4, and draws the rollout percentage from the remaining range 5 through 99.

Q: Does sticky bucketing work across distributed caching tiers?

Yes. The bucket is computed from the flag key and targetingKey at evaluation time, so no per-user state needs to be cached. Any cache layer holding the current flag configuration will produce the same bucket.

This how-to is part of Implementing Progressive Delivery Workflows. It solves a specific, silent failure mode: as you ramp a flag from 1% to 100%, a user can flip between the control and treatment variant on consecutive requests if the bucketing is not deterministic. The user experience becomes inconsistent, your experiment metrics are corrupted, and you cannot trust the rollout signals that gate further promotion.

The root cause is always the same: a bucketing input that changes between requests (random UUID, wall-clock timestamp, per-request session token) or a bucketing function that is not consistent across replicas. Fix both by hashing a stable identity attribute and mapping the hash to a bucket that does not change as the percentage climbs.

The targeting key is hashed deterministically; the bucket number (hash mod 100) is compared to the rollout threshold to assign the variant consistently across all replicas.

Prerequisites

OpenFeature server SDK ≥ 1.x installed (@openfeature/server-sdk or openfeature Python)
flagd ≥ 0.6 as the provider (its fractionalEvaluation uses MurmurHash3 internally)
A stable user identity attribute available in the evaluation context (userId, sessionId, or a stable anonymous ID)
Flag definitions follow the namespace.service.feature key schema from your progressive delivery setup
Evaluation context assembled before the first flag call per request boundary

Step-by-Step Procedure

Step 1 — Hash the `targetingKey` deterministically

The targetingKey in the OpenFeature evaluation context is the single input that drives bucketing. Choose an attribute that is stable for the lifetime of the rollout — for logged-in users this is userId; for anonymous sessions use a persistent cookie value set on first visit.

import { OpenFeature, EvaluationContext } from '@openfeature/server-sdk';

// Build context once per request; reuse across all flag calls in that request
function buildEvalContext(req: Request): EvaluationContext {
  return {
    targetingKey: req.user?.id ?? req.cookies['anon_id'],  // stable, not random
    tenantId: req.user?.tenantId,
    region: req.headers['x-region'] ?? 'us-east-1',
  };
}

const client = OpenFeature.getClient();
const ctx = buildEvalContext(req);
const enabled = await client.getBooleanValue('checkout.payments.express-pay', false, ctx);

Do not generate a new UUID per request as the targetingKey. A crypto.randomUUID() on every call is the single most common cause of user flipping — it produces a unique hash every time, assigning a random bucket on each request.

Step 2 — Map the hash to a stable bucket

flagd’s fractionalEvaluation operator applies MurmurHash3 to the concatenation of the flag key and the targetingKey, then takes modulo 100. The result is a bucket in [0, 99]. Because the hash and the modulo are deterministic, the same key always lands in the same bucket across every flagd replica.

# flagd flag definition — stable fractional bucketing
flags:
  checkout.payments.express-pay:
    state: ENABLED
    variants:
      "on": true
      "off": false
    defaultVariant: "off"
    targeting:
      fractionalEvaluation:
        - { "var": "targetingKey" }   # hashed input — must be stable
        - ["on",  10]   # 10% get "on"
        - ["off", 90]   # 90% get "off"

The fractionalEvaluation key is a flagd extension to the OpenFeature JSON rules schema. If you are writing a custom provider, implement the same hash: MurmurHash3_x86_32(flagKey + targetingKey) mod 100.

Step 3 — Define a ramp schedule

Advance the percentage in discrete steps after verifying metrics at each level. Document the schedule so every operator knows what to expect.

# Ramp schedule — run each step only after observing the previous for ≥ 1 hour
# Step 1: 1%  (canary — catch catastrophic defects)
flagctl set checkout.payments.express-pay --percentage 1 --env prod

# Step 2: 5%  (after p95 latency and error rate stable)
flagctl set checkout.payments.express-pay --percentage 5 --env prod

# Step 3: 20%
flagctl set checkout.payments.express-pay --percentage 20 --env prod

# Step 4: 50%
flagctl set checkout.payments.express-pay --percentage 50 --env prod

# Step 5: 100%
flagctl set checkout.payments.express-pay --percentage 100 --env prod

A user whose targetingKey hashes to bucket 7 sees "on" at 10% and still sees "on" at 50% — their bucket does not change, only the threshold moves. This is sticky bucketing: the user stays in the same variant as the population grows around them.

Step 4 — Verify stickiness across replicas

A consistent hash function in the flag definition guarantees stickiness only if every replica uses the same function with the same inputs. Verify by resolving the same targetingKey against multiple flagd instances and asserting they all return the same variant.

# Resolve the same key on every replica; expect all to return "on" for a known bucket
TARGETING_KEY="user-7829"
for host in $(cat flagd-replicas.txt); do
  result=$(curl -s -X POST "http://$host:8013/schema.v1.Service/ResolveBoolean" \
    -H 'Content-Type: application/json' \
    -d "{\"flagKey\":\"checkout.payments.express-pay\",\"context\":{\"targetingKey\":\"$TARGETING_KEY\"}}" \
    | jq -r '.value')
  echo "$host -> $result"
done
# Every line must show the same value — if any differ, the hash function or flag config diverged

Run this check after each flagd deployment and after any flag-config update to catch accidental divergence early.

Verification Step

Assert that the observed percentage matches the configured percentage by sampling a large, random targetingKey population:

import mmh3  # pip install mmh3

flag_key = "checkout.payments.express-pay"
target_pct = 10   # configured percentage

count_on = 0
samples = 10_000

for i in range(samples):
    targeting_key = f"user-{i}"
    bucket = mmh3.hash(flag_key + targeting_key, signed=False) % 100
    if bucket < target_pct:
        count_on += 1

observed_pct = count_on / samples * 100
assert abs(observed_pct - target_pct) < 1.5, f"Expected ~{target_pct}%, got {observed_pct:.1f}%"
print(f"Observed {observed_pct:.1f}% — within tolerance of {target_pct}%")

This test runs offline against the same hash function the provider uses, so it validates both the formula and the distribution without touching production.

Gotchas & Edge Cases

Anonymous → logged-in identity transition: if a user moves from an anonymous anon_id to a userId mid-session, they may switch buckets and see a different variant. Decide before rollout whether to preserve the anonymous bucket (pass anon_id even after login) or accept the transition. Document the decision in the flag metadata.
Flag-key changes invalidate buckets: changing the flag key changes the hash input and rebuckets every user. Treat a flag-key rename as a new experiment; never rename a live rollout flag. See polling vs streaming flag synchronization — a key rename is a config change that must propagate before the old key goes dead.
Custom providers must replicate the hash exactly: if you write a custom server-side SDK provider, use the same algorithm (MurmurHash3 x86 32-bit, seed 0) and the same input format (flagKey + targetingKey, no separator). A different hash or seed produces different buckets and breaks consistency.

Troubleshooting & FAQ

A user reports seeing the new variant sometimes and the old one other times — how do I debug?

Log the resolved variant and the targetingKey on every flag call. If the key is consistent, the bucket is consistent — so a flip means the targetingKey is actually changing (anonymous ID regenerated, logged-in vs anonymous, or a provider bug). Compare the logged keys across the flipping requests; they will differ.

How do I run a holdout group that permanently stays in control?

Reserve a bucket range explicitly: configure the flag so that bucket 0–4 is always "off" (holdout), and the rollout percentage pulls from 5–99. In flagd you can nest a fractionalEvaluation with adjusted weights after filtering out the holdout range with an if clause.

Does sticky bucketing work across distributed caching tiers?

Yes — because the bucket is computed from the key and flag name, the cache does not need to store a per-user bucket assignment. Any cache layer that holds the current flag configuration will produce the same bucket when asked. The cache only needs to be fresh with the current percentage threshold, not with per-user state.