Testing Contracts

A contract is schema + rules + retry config — none of that needs a live LLM to test. Your contract logic is pure, and the entire pipeline is exposed through verify and a mockable RunFn.

Test schema + rules without an LLM

verify validates data directly:

import { describe, it, expect } from "vitest";
import { verify } from "@withboundary/contract";
import { leadSchema, leadRules } from "../src/contracts/lead";

describe("lead contract rules", () => {
  it("accepts cold tier with low score", () => {
    const result = verify({ tier: "cold", score: 25, reason: "Low intent" }, leadSchema, leadRules);
    expect(result.ok).toBe(true);
  });

  it("rejects hot tier below threshold", () => {
    const result = verify({ tier: "hot", score: 25, reason: "x" }, leadSchema, leadRules);
    expect(result.ok).toBe(false);
    if (result.ok) return;
    expect(result.error.attempts[0].category).toBe("RULE_ERROR");
    expect(result.error.attempts[0].issues).toContain(
      "hot leads require score > 70, got 25"
    );
  });

  it("rejects empty reason", () => {
    const result = verify({ tier: "cold", score: 25, reason: "" }, leadSchema, leadRules);
    expect(result.ok).toBe(false);
  });
});

Fast, deterministic, no mocks. Run these in CI without an API key.

Test the full loop with a fake `RunFn`

RunFn is just (attempt) => Promise<string | null>. Replace it with a function that returns canned strings:

import { describe, it, expect } from "vitest";
import { leadContract } from "../src/contracts/lead";

describe("lead contract loop", () => {
  it("accepts on first attempt", async () => {
    const run = async () => JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" });
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(true);
    if (!result.ok) return;
    expect(result.attempts).toBe(1);
  });

  it("repairs then accepts", async () => {
    const responses = [
      JSON.stringify({ tier: "hot", score: 25, reason: "" }),        // attempt 1 — fails
      JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" }), // attempt 2 — ok
    ];
    const run = async () => responses.shift()!;
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(true);
    if (!result.ok) return;
    expect(result.attempts).toBe(2);
  });

  it("fails after retries exhausted", async () => {
    const run = async () => JSON.stringify({ tier: "hot", score: 25, reason: "" });
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(false);
    if (result.ok) return;
    expect(result.error.attempts).toHaveLength(3);  // default maxAttempts
    expect(result.error.attempts.every((a) => a.category === "RULE_ERROR")).toBe(true);
  });
});

Assert on repair context

Your RunFn sees attempt.repairs — assert that the loop is actually sending repair messages back:

it("sends violation back to the model on retry", async () => {
  const seenRepairs: string[] = [];
  const responses = [
    JSON.stringify({ tier: "hot", score: 25, reason: "" }),
    JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" }),
  ];
  const run = async (attempt) => {
    seenRepairs.push(...attempt.repairs.map((r) => r.content));
    return responses.shift()!;
  };

  await leadContract.accept(run);

  // Attempt 1 has no repairs. Attempt 2's repairs include the violation.
  expect(seenRepairs.some((s) => s.includes("hot leads require score > 70"))).toBe(true);
});

Test without `defineContract` at all

Use enforce inline for one-shot tests:

import { enforce } from "@withboundary/contract";

it("ad-hoc schema works", async () => {
  const schema = z.object({ n: z.number() });
  const result = await enforce(schema, async () => '{"n": 42}', { name: "test" });
  expect(result).toMatchObject({ ok: true, data: { n: 42 } });
});

Test failure categories

The 8 failure categories each have distinct triggers. Use fake RunFn outputs to hit them:

Category	`RunFn` returns	Why
`EMPTY_RESPONSE`	`null` or `""`	nothing to parse
`REFUSAL`	`"I can't help with that."`	detected refusal language
`NO_JSON`	`"just prose, no json"`	no parseable JSON
`TRUNCATED`	`"{\"a\": 1, \"b\":"`	obviously cut off
`PARSE_ERROR`	`"{a: 1}"`	malformed JSON
`VALIDATION_ERROR`	valid JSON but wrong types	schema rejected
`RULE_ERROR`	passes schema, fails a rule	rule rejected
`RUN_ERROR`	`throw new Error("boom")`	your `RunFn` threw

it.each([
  ["EMPTY_RESPONSE", async () => null],
  ["NO_JSON", async () => "just prose"],
  ["PARSE_ERROR", async () => "{a:1}"],
  ["VALIDATION_ERROR", async () => '{"tier": "xyz", "score": 10, "reason": "x"}'],
])("categorises %s", async (expected, run) => {
  const result = await leadContract.accept(run);
  expect(result.ok).toBe(false);
  if (result.ok) return;
  expect(result.error.attempts[0].category).toBe(expected);
});

Disable retries in tests

Default maxAttempts: 3 means a failing RunFn runs three times. For tighter feedback, lower it:

it("fails fast in tests", async () => {
  const result = await leadContract.accept(run, {
    retry: { maxAttempts: 1 },
  });
  // runs once, fails, reports, moves on
});

Turn off the logger

If your test environment has the Boundary API key set (e.g. CI env leaks BOUNDARY_API_KEY), you don’t want tests shipping events to the real dashboard. Three options:

Scrub the env — vi.stubEnv("BOUNDARY_API_KEY", "") so createBoundaryLogger returns null.
Pass logger: undefined at call time — overrides the defined logger for this test.
Use a capture sink:

const events: BoundaryLogEvent[] = [];
const testLogger = createBoundaryLogger({
  write(batch) { events.push(...batch); },
  flushOnExit: false,
});

await leadContract.accept(run, { logger: testLogger! });

await testLogger!.flush();
expect(events).toHaveLength(1);
expect(events[0].ok).toBe(true);

The third approach is great for asserting that the observability layer does what you expect.

Engine primitives

verify, classify, clean used directly

ContractLogger hooks

Assert on specific lifecycle events

​Test schema + rules without an LLM

​Test the full loop with a fake RunFn

​Assert on repair context

​Test without defineContract at all

​Test failure categories

​Disable retries in tests

​Turn off the logger

​See also