Reference

Idempotency

Every POST /v1/TENANT/recommendation is idempotent on the (client_id, request_id) pair for 5 minutes.

TL;DR

Generate a fresh UUID (v7 preferred, v4 acceptable) for request_id per logical request.
If your request times out or you hit a retryable error (429 / 5xx), retry with the same request_id.
Within 5 minutes, the same request_id returns the cached response — no re-processing, no double-charge, no duplicate audit-trail row.

How the cache key is computed

Code
cache_key = hash(client_id + ":" + request_id)
TTL       = 300 seconds
storage   = Redis (origin-side, EU region)

client_id is the partner tenant identifier (cxh-sandbox-TENANT or cxh-prod-TENANT). The gateway derives it from your API key via the injected X-Zuplo-Partner-Id header — you don't send it explicitly.
request_id is the UUID string you send.
Scope: per-partner. Your request_id = "abc" and another partner's request_id = "abc" are different cache keys — no cross-partner collision.

When the cache serves a replay

A repeat request hits the cache iff all three hold:

Same client_id (automatic — derived from your API key).
Same request_id in the body.
Less than 5 minutes since the original request was served.

Cache hit: response carries X-CxH-Cache: hit and served_at reflects the original processing time, not the replay time.

Cache miss: response carries X-CxH-Cache: miss and the request is processed from scratch.

You can therefore drive cache-aware logic off the header rather than parsing served_at deltas:


Code
r = httpx.post(...)
if r.headers.get("X-CxH-Cache") == "hit":
    # idempotent replay — same recommendation as before
    ...

When the cache does NOT serve a replay

request_id is different — fresh request.
More than 5 minutes have passed — cache entry expired; fresh request.
The original request returned 422 invalid_request or 422 out_of_scope — validation failures are not cached. Retry with the same request_id re-runs validation.
The original request returned 401 / 403 (gateway or consent) — gateway rejections are not cached. Fix the auth issue first.

UUID version recommendations

UUID v7 (preferred): time-sortable. Easier to correlate with your own logs if you're debugging a 48-hour window of requests — sorting by request_id sorts by time.
UUID v4: random. Works fine. Harder to correlate visually.
Anything else (v1, custom schemes, incrementing integers): technically valid as long as it's a string, but fights the spirit of idempotency. Use a real UUID.

Choosing your retry strategy

Scenario A: client-side timeout (no response from us)

Your client sent the request but timed out before our response reached you. Unclear whether we processed it.

Retry with the same request_id.
If we had already processed it: you get the cached response (200 OK with the original citations).
If we hadn't: we process fresh.
Either way, no duplicate audit-trail row, no duplicate recommendation.

Scenario B: 429 rate-limited

You hit 30 req/min. Zuplo rejected before the origin saw the request.

Wait for Retry-After seconds.
Retry with the same request_id.
The origin never saw the earlier attempt, so the cache has no entry. First real processing happens on retry.

Scenario C: 502 persistence_failed

We generated a recommendation but couldn't persist it to the audit-trail. We fail-closed, so we didn't serve the response.

Retry after 1s with the same request_id.
The cache has no entry (we failed before caching).
On retry, we regenerate. In practice the result is deterministic given the same inputs, so you'll get functionally the same recommendation.

Scenario D: 500 internal_error

Unexpected failure somewhere in the pipeline.

Retry after 1–2s with the same request_id.
Exponential backoff if it persists.
Include the trace_id in any support ticket.

Anti-patterns

Do not generate a new request_id for retries. That bypasses idempotency and risks a duplicate recommendation + duplicate audit-trail row.
Do not reuse a request_id across logically distinct requests. If the user asks a different question, that's a different request_id. Reusing an ID for different content will serve the cached (old) response and silently ignore your new content.
Do not parse served_at as the response generation time on a replay. It reflects the original processing time. If the difference matters for your UI, track your local send time instead.

Audit-trail implication

Every served response (cache hit or miss) corresponds to exactly one row in our partner audit-trail collection. The row is written on the cache miss that produced the original response. Cache hits do not write new rows — they serve what's already stored.

This matters for:

Billing: one row = one billable unit. Retries within the 5-minute window are free.
Audit: a repeat request_id that served from cache will not show up as a new audit-trail event.
Rate limit: cache hits still count against your 30 req/min. The rate limit is enforced at the Zuplo edge before the cache lookup. If you're hitting rate limits on replays, back off per Retry-After.

Example: safe retry loop


Code
import time
import uuid
import httpx

def request_recommendation(body: dict, *, max_retries: int = 3) -> dict:
    request_id = str(uuid.uuid4())  # or uuid7 if you have it
    body = {**body, "request_id": request_id}

    for attempt in range(max_retries + 1):
        try:
            r = httpx.post(
                f"{CXH_BASE_URL}/v1/{TENANT}/recommendation",
                headers={"Authorization": f"ApiKey {CXH_API_KEY}"},
                json=body,
                timeout=30.0,
            )
        except httpx.TimeoutException:
            if attempt < max_retries:
                time.sleep(2 ** attempt)  # 1s, 2s, 4s
                continue
            raise

        if r.status_code == 429:
            retry_after = int(r.headers.get("Retry-After", 1))
            time.sleep(retry_after)
            continue
        if r.status_code in (500, 502, 503) and attempt < max_retries:
            time.sleep(2 ** attempt)
            continue

        r.raise_for_status()
        return r.json()

    raise RuntimeError(f"Exceeded {max_retries} retries for request_id={request_id}")

Note the request_id is generated once, outside the retry loop. Every retry reuses it. This is the correct pattern.

Last modified on April 29, 2026

Errors Rate Limits