Resilience - Boundary

The HTTP transport is built to fail gracefully. You don’t configure any of this — these are the guarantees the SDK ships with.

Retry schedule

Every batch gets up to 3 attempts with exponential backoff and up to 50% jitter.

Attempt	Base delay	With jitter
1	—	(no delay)
2	100ms	100-150ms
3	400ms	400-600ms

After three failed attempts the batch is dropped and onError fires once.

What retries

Condition	Action
Network error (ENOTFOUND, ECONNRESET, etc.)	Retry
5xx response	Retry
408, 429	Retry (429 uses `Retry-After`, see below)
401 / 403	No retry. Logger disabled.
Other 4xx	No retry. Batch dropped.
Timeout (per attempt)	Retry

429 + `Retry-After`

On HTTP 429, the transport honors the server’s Retry-After header before the next attempt:

Seconds (Retry-After: 30) → wait 30 seconds.
HTTP-date (Retry-After: Wed, 21 Oct 2026 07:28:00 GMT) → wait until that time.
Missing / unparseable → fall back to the default backoff.

Capped at 60 seconds so a pathological header can’t stall your process indefinitely.

Circuit breaker

After 5 consecutive failures, the breaker opens. While open (for 30 seconds), the transport short-circuits — no network, no retry — and onError fires with a BreakerOpenError. When the cooldown ends, the breaker flips to half-open. The next batch becomes a probe:

Probe succeeds → breaker closes, normal operation resumes.
Probe fails → breaker opens again for another 30s.

This is the protection against retry storms during backend outages. You’d otherwise stack up queued events, retry them 3x each, and amplify the outage from your side. Auth errors bypass the breaker — bad credentials aren’t transient, and there’s no point counting them against the breaker state.

Auth failures (401/403)

The first 401 or 403 disables the logger permanently for the process’s lifetime:

The specific batch that hit the error is dropped.
Every subsequent onRunSuccess / onRunFailure is a no-op — no events are enqueued.
onError fires once with an AuthError.

This prevents spamming the dashboard with auth failures. Fix the key and redeploy.

Per-attempt timeout

Every HTTP request runs under a 10-second AbortController timeout. If the request doesn’t complete in 10s, the fetch is aborted, the attempt counts as a failure, and the backoff begins. This is not configurable today. It’s tuned conservatively because the ingest endpoint is designed to respond in < 500ms under normal load.

`onError` — permanent drops

onError is called when the transport gives up on a batch — after retries exhaust, when the breaker is open, or when a non-retryable status returns.

createBoundaryLogger({
  onError(err) {
    metrics.increment("boundary.drop");
    console.warn("Boundary event dropped:", err);
  },
});

Default behavior (no onError supplied): the SDK calls console.warn once per process, then falls silent to avoid flooding logs during an outage. Errors that surface here:

Error name	Meaning
`AuthError`	401/403 — logger is now disabled
`RateLimitError`	429 after all retries exhausted
`BreakerOpenError`	Breaker short-circuited the batch
`NonRetryableStatusError`	4xx other than 401/403/408/429
Generic `Error`	Network/timeout/5xx after all retries

Keep-alive

Native fetch in Node 18+ and every runtime the SDK targets reuses TCP connections automatically. There’s no Agent to configure and no connection pool to tune.

Summary

3 attempts per batch (100ms, 400ms, 1600ms), 50% jitter
429 honors Retry-After (capped at 60s)
Breaker: 5 fails → open 30s → probe
401/403 → disable logger permanently, no retry
10s timeout per attempt via AbortController
Permanent drops surface via onError

Batching

Queue overflow behavior during outages

Shutdown

Drain timeouts during exit

​Retry schedule

​What retries

​429 + Retry-After

​Circuit breaker

​Auth failures (401/403)

​Per-attempt timeout

​onError — permanent drops

​Keep-alive

​Summary

​See also