Reference

Rate Limits

Current limit

30 requests per minute per partner API key.

This is a hard ceiling enforced at the Zuplo edge before the origin receives the request.

How it's measured

Sliding window, not a calendar-minute bucket. The 30-req budget is based on the trailing 60 seconds from the moment the request arrives.
Per API key. If you have two keys (e.g. a dev key and a prod key from the same tenant), they have independent 30-req/min budgets.
Every HTTP request counts, regardless of response status. A 422 that bounces on validation still consumed one unit of budget.
Idempotent replays count. Re-sending a request_id that hits the 5-min cache still counts against rate limit (Zuplo enforces before cache lookup).

What happens on breach

HTTP 429 Too Many Requests with a Retry-After response header. The Retry-After value is seconds to wait before the next safe request.

Code
HTTP/1.1 429 Too Many Requests
Retry-After: 17
Content-Type: application/json

<Zuplo-emitted rate-limit body — see docs.collectivex.health/api for schema>

The exact body is Zuplo-emitted, so its shape is documented in the auto-generated OpenAPI reference at docs.collectivex.health/api rather than here. The Retry-After header is what your code should key off of — not the body.

Recommended client handling

Minimum viable

Honor Retry-After literally:


Code
if r.status_code == 429:
    time.sleep(int(r.headers["Retry-After"]))
    # Then retry with same request_id

That's correct but pessimistic at low concurrency.

Exponential backoff with jitter (preferred)

For request bursts or batch jobs, wrap retries in exponential backoff with full jitter to avoid thundering-herd:


Code
import time
import random

def backoff_with_jitter(attempt: int, retry_after: int | None) -> float:
    """Base case: honor Retry-After. Fallback: exponential with jitter."""
    if retry_after is not None:
        return retry_after
    base = min(2 ** attempt, 32)  # 1, 2, 4, 8, 16, 32 (cap)
    return random.uniform(0, base)  # full jitter

def call_with_retry(request_body, max_retries=5):
    for attempt in range(max_retries + 1):
        r = httpx.post(
            f"{CXH_BASE_URL}/v1/{TENANT}/recommendation",
            headers={"Authorization": f"ApiKey {CXH_API_KEY}"},
            json=request_body,
        )
        if r.status_code != 429:
            return r
        retry_after = int(r.headers.get("Retry-After", 0)) or None
        sleep_s = backoff_with_jitter(attempt, retry_after)
        time.sleep(sleep_s)
    raise RuntimeError(f"Exceeded {max_retries} 429 retries")

Node.js / JavaScript


Code
async function callWithRetry(body, { maxRetries = 5 } = {}) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const r = await fetch(`${CXH_BASE_URL}/v1/${TENANT}/recommendation`, {
      method: "POST",
      headers: {
        "Authorization": `ApiKey ${CXH_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });
    if (r.status !== 429) return r;
    const retryAfter = parseInt(r.headers.get("Retry-After") || "0", 10);
    const base = Math.min(2 ** attempt, 32);
    const sleepMs = (retryAfter || Math.random() * base) * 1000;
    await new Promise((resolve) => setTimeout(resolve, sleepMs));
  }
  throw new Error(`Exceeded ${maxRetries} 429 retries`);
}

Planning your throughput

At 30 req/min sustained:

30 requests/min = 1 request every 2 seconds.
~43k requests/day if evenly paced.
Burst budget is soft — the sliding window allows ~30 requests in a sub-second burst, then throttles until the window recovers.

For batch workloads (e.g. nightly digest, re-processing), rate-limit your own dispatcher at 25 req/min (leaves 5-req headroom for user-initiated traffic). A simple token-bucket works:


Code
# Refill 25 tokens per minute = 1 token every 2.4 seconds
MIN_INTERVAL_S = 60.0 / 25  # 2.4
last_send = 0.0
for item in batch:
    elapsed = time.monotonic() - last_send
    if elapsed < MIN_INTERVAL_S:
        time.sleep(MIN_INTERVAL_S - elapsed)
    send(item)
    last_send = time.monotonic()

Getting a higher limit

30 req/min is the default for sandbox integration. Prod cutover may have a different ceiling — decided per partner contract.

To request a higher limit:

Open a ticket via support with subject Rate limit increase — <tenant-id>.
Include: current peak req/min (from your own metrics), projected peak, and a justification (user growth, feature launch, batch workload).
Response time: typically 2–3 business days. Rate-limit changes require CollectiveX-side review of origin capacity.
Sandbox limits can be raised temporarily for load testing — request a time window (e.g. "weekdays 10am–12pm UTC for 2 weeks").

Anti-patterns

Don't retry immediately on 429. You'll just consume more budget and get throttled harder.
Don't retry forever. Cap at 5–7 retries. If you're genuinely hitting the limit, your architecture needs a queue, not longer backoff.
Don't parallelize requests for the same partner tenant across multiple callers without coordinating rate. Two independent callers each doing 25 req/min = 50 req/min total = constant throttling. Put a shared token-bucket in front.
Don't confuse 429 with 503. 429 = you're going too fast; back off. 503 = we have a transient outage; also back off but the retry-after semantics differ. Both merit honoring Retry-After if present.

Monitoring your own headroom

Every 200/4xx response carries:

Code
X-RateLimit-Remaining: 27
X-RateLimit-Limit: 30
X-RateLimit-Reset: 42        # seconds until the sliding window resets

Use X-RateLimit-Remaining to drive client-side throttling — back off proactively when remaining drops below your safety margin (e.g. 5 or 6) instead of waiting for the 429.

A 429 indicates you exceeded the window by at least 1 request. Treat hitting any 429 as a sign your dispatch logic is too aggressive and tighten the throttle.

Last modified on April 29, 2026

Idempotency