CollectiveX Health Partner API
  • Get Started
  • Reference
  • Operations
  • API Reference
AuthenticationRequest / Response ReferenceErrorsIdempotencyRate Limits
powered by Zuplo
Reference

Rate Limits

Current limit

30 requests per minute per partner API key.

This is a hard ceiling enforced at the Zuplo edge before the origin receives the request.

How it's measured

  • Sliding window, not a calendar-minute bucket. The 30-req budget is based on the trailing 60 seconds from the moment the request arrives.
  • Per API key. If you have two keys (e.g. a dev key and a prod key from the same tenant), they have independent 30-req/min budgets.
  • Every HTTP request counts, regardless of response status. A 422 that bounces on validation still consumed one unit of budget.
  • Idempotent replays count. Re-sending a request_id that hits the 5-min cache still counts against rate limit (Zuplo enforces before cache lookup).

What happens on breach

HTTP 429 Too Many Requests with a Retry-After response header. The Retry-After value is seconds to wait before the next safe request.

Code
HTTP/1.1 429 Too Many Requests Retry-After: 17 Content-Type: application/json <Zuplo-emitted rate-limit body — see docs.collectivex.health/api for schema>

The exact body is Zuplo-emitted, so its shape is documented in the auto-generated OpenAPI reference at docs.collectivex.health/api rather than here. The Retry-After header is what your code should key off of — not the body.

Recommended client handling

Minimum viable

Honor Retry-After literally:

Code
if r.status_code == 429: time.sleep(int(r.headers["Retry-After"])) # Then retry with same request_id

That's correct but pessimistic at low concurrency.

Exponential backoff with jitter (preferred)

For request bursts or batch jobs, wrap retries in exponential backoff with full jitter to avoid thundering-herd:

Code
import time import random def backoff_with_jitter(attempt: int, retry_after: int | None) -> float: """Base case: honor Retry-After. Fallback: exponential with jitter.""" if retry_after is not None: return retry_after base = min(2 ** attempt, 32) # 1, 2, 4, 8, 16, 32 (cap) return random.uniform(0, base) # full jitter def call_with_retry(request_body, max_retries=5): for attempt in range(max_retries + 1): r = httpx.post( f"{CXH_BASE_URL}/v1/{TENANT}/recommendation", headers={"Authorization": f"ApiKey {CXH_API_KEY}"}, json=request_body, ) if r.status_code != 429: return r retry_after = int(r.headers.get("Retry-After", 0)) or None sleep_s = backoff_with_jitter(attempt, retry_after) time.sleep(sleep_s) raise RuntimeError(f"Exceeded {max_retries} 429 retries")

Node.js / JavaScript

Code
async function callWithRetry(body, { maxRetries = 5 } = {}) { for (let attempt = 0; attempt <= maxRetries; attempt++) { const r = await fetch(`${CXH_BASE_URL}/v1/${TENANT}/recommendation`, { method: "POST", headers: { "Authorization": `ApiKey ${CXH_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify(body), }); if (r.status !== 429) return r; const retryAfter = parseInt(r.headers.get("Retry-After") || "0", 10); const base = Math.min(2 ** attempt, 32); const sleepMs = (retryAfter || Math.random() * base) * 1000; await new Promise((resolve) => setTimeout(resolve, sleepMs)); } throw new Error(`Exceeded ${maxRetries} 429 retries`); }

Planning your throughput

At 30 req/min sustained:

  • 30 requests/min = 1 request every 2 seconds.
  • ~43k requests/day if evenly paced.
  • Burst budget is soft — the sliding window allows ~30 requests in a sub-second burst, then throttles until the window recovers.

For batch workloads (e.g. nightly digest, re-processing), rate-limit your own dispatcher at 25 req/min (leaves 5-req headroom for user-initiated traffic). A simple token-bucket works:

Code
# Refill 25 tokens per minute = 1 token every 2.4 seconds MIN_INTERVAL_S = 60.0 / 25 # 2.4 last_send = 0.0 for item in batch: elapsed = time.monotonic() - last_send if elapsed < MIN_INTERVAL_S: time.sleep(MIN_INTERVAL_S - elapsed) send(item) last_send = time.monotonic()

Getting a higher limit

30 req/min is the default for sandbox integration. Prod cutover may have a different ceiling — decided per partner contract.

To request a higher limit:

  1. Open a ticket via support with subject Rate limit increase — <tenant-id>.
  2. Include: current peak req/min (from your own metrics), projected peak, and a justification (user growth, feature launch, batch workload).
  3. Response time: typically 2–3 business days. Rate-limit changes require CollectiveX-side review of origin capacity.
  4. Sandbox limits can be raised temporarily for load testing — request a time window (e.g. "weekdays 10am–12pm UTC for 2 weeks").

Anti-patterns

  • Don't retry immediately on 429. You'll just consume more budget and get throttled harder.
  • Don't retry forever. Cap at 5–7 retries. If you're genuinely hitting the limit, your architecture needs a queue, not longer backoff.
  • Don't parallelize requests for the same partner tenant across multiple callers without coordinating rate. Two independent callers each doing 25 req/min = 50 req/min total = constant throttling. Put a shared token-bucket in front.
  • Don't confuse 429 with 503. 429 = you're going too fast; back off. 503 = we have a transient outage; also back off but the retry-after semantics differ. Both merit honoring Retry-After if present.

Monitoring your own headroom

Every 200/4xx response carries:

Code
X-RateLimit-Remaining: 27 X-RateLimit-Limit: 30 X-RateLimit-Reset: 42 # seconds until the sliding window resets

Use X-RateLimit-Remaining to drive client-side throttling — back off proactively when remaining drops below your safety margin (e.g. 5 or 6) instead of waiting for the 429.

A 429 indicates you exceeded the window by at least 1 request. Treat hitting any 429 as a sign your dispatch logic is too aggressive and tighten the throttle.

Last modified on April 29, 2026
Idempotency
On this page
  • Current limit
  • How it's measured
  • What happens on breach
  • Recommended client handling
    • Minimum viable
    • Exponential backoff with jitter (preferred)
    • Node.js / JavaScript
  • Planning your throughput
  • Getting a higher limit
  • Anti-patterns
  • Monitoring your own headroom
Javascript