Skip to main content
A contract is schema + rules + retry config — none of that needs a live LLM to test. Your contract logic is pure, and the entire pipeline is exposed through verify and a mockable RunFn.

Test schema + rules without an LLM

verify validates data directly:
import { describe, it, expect } from "vitest";
import { verify } from "@withboundary/contract";
import { leadSchema, leadRules } from "../src/contracts/lead";

describe("lead contract rules", () => {
  it("accepts cold tier with low score", () => {
    const result = verify({ tier: "cold", score: 25, reason: "Low intent" }, leadSchema, leadRules);
    expect(result.ok).toBe(true);
  });

  it("rejects hot tier below threshold", () => {
    const result = verify({ tier: "hot", score: 25, reason: "x" }, leadSchema, leadRules);
    expect(result.ok).toBe(false);
    if (result.ok) return;
    expect(result.error.attempts[0].category).toBe("RULE_ERROR");
    expect(result.error.attempts[0].issues).toContain(
      "hot leads require score > 70, got 25"
    );
  });

  it("rejects empty reason", () => {
    const result = verify({ tier: "cold", score: 25, reason: "" }, leadSchema, leadRules);
    expect(result.ok).toBe(false);
  });
});
Fast, deterministic, no mocks. Run these in CI without an API key.

Test the full loop with a fake RunFn

RunFn is just (attempt) => Promise<string | null>. Replace it with a function that returns canned strings:
import { describe, it, expect } from "vitest";
import { leadContract } from "../src/contracts/lead";

describe("lead contract loop", () => {
  it("accepts on first attempt", async () => {
    const run = async () => JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" });
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(true);
    if (!result.ok) return;
    expect(result.attempts).toBe(1);
  });

  it("repairs then accepts", async () => {
    const responses = [
      JSON.stringify({ tier: "hot", score: 25, reason: "" }),        // attempt 1 — fails
      JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" }), // attempt 2 — ok
    ];
    const run = async () => responses.shift()!;
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(true);
    if (!result.ok) return;
    expect(result.attempts).toBe(2);
  });

  it("fails after retries exhausted", async () => {
    const run = async () => JSON.stringify({ tier: "hot", score: 25, reason: "" });
    const result = await leadContract.accept(run);
    expect(result.ok).toBe(false);
    if (result.ok) return;
    expect(result.error.attempts).toHaveLength(3);  // default maxAttempts
    expect(result.error.attempts.every((a) => a.category === "RULE_ERROR")).toBe(true);
  });
});

Assert on repair context

Your RunFn sees attempt.repairs — assert that the loop is actually sending repair messages back:
it("sends violation back to the model on retry", async () => {
  const seenRepairs: string[] = [];
  const responses = [
    JSON.stringify({ tier: "hot", score: 25, reason: "" }),
    JSON.stringify({ tier: "cold", score: 25, reason: "Low intent" }),
  ];
  const run = async (attempt) => {
    seenRepairs.push(...attempt.repairs.map((r) => r.content));
    return responses.shift()!;
  };

  await leadContract.accept(run);

  // Attempt 1 has no repairs. Attempt 2's repairs include the violation.
  expect(seenRepairs.some((s) => s.includes("hot leads require score > 70"))).toBe(true);
});

Test without defineContract at all

Use enforce inline for one-shot tests:
import { enforce } from "@withboundary/contract";

it("ad-hoc schema works", async () => {
  const schema = z.object({ n: z.number() });
  const result = await enforce(schema, async () => '{"n": 42}', { name: "test" });
  expect(result).toMatchObject({ ok: true, data: { n: 42 } });
});

Test failure categories

The 8 failure categories each have distinct triggers. Use fake RunFn outputs to hit them:
CategoryRunFn returnsWhy
EMPTY_RESPONSEnull or ""nothing to parse
REFUSAL"I can't help with that."detected refusal language
NO_JSON"just prose, no json"no parseable JSON
TRUNCATED"{\"a\": 1, \"b\":"obviously cut off
PARSE_ERROR"{a: 1}"malformed JSON
VALIDATION_ERRORvalid JSON but wrong typesschema rejected
RULE_ERRORpasses schema, fails a rulerule rejected
RUN_ERRORthrow new Error("boom")your RunFn threw
it.each([
  ["EMPTY_RESPONSE", async () => null],
  ["NO_JSON", async () => "just prose"],
  ["PARSE_ERROR", async () => "{a:1}"],
  ["VALIDATION_ERROR", async () => '{"tier": "xyz", "score": 10, "reason": "x"}'],
])("categorises %s", async (expected, run) => {
  const result = await leadContract.accept(run);
  expect(result.ok).toBe(false);
  if (result.ok) return;
  expect(result.error.attempts[0].category).toBe(expected);
});

Disable retries in tests

Default maxAttempts: 3 means a failing RunFn runs three times. For tighter feedback, lower it:
it("fails fast in tests", async () => {
  const result = await leadContract.accept(run, {
    retry: { maxAttempts: 1 },
  });
  // runs once, fails, reports, moves on
});

Turn off the logger

If your test environment has the Boundary API key set (e.g. CI env leaks BOUNDARY_API_KEY), you don’t want tests shipping events to the real dashboard. Three options:
  1. Scrub the envvi.stubEnv("BOUNDARY_API_KEY", "") so createBoundaryLogger returns null.
  2. Pass logger: undefined at call time — overrides the defined logger for this test.
  3. Use a capture sink:
const events: BoundaryLogEvent[] = [];
const testLogger = createBoundaryLogger({
  write(batch) { events.push(...batch); },
  flushOnExit: false,
});

await leadContract.accept(run, { logger: testLogger! });

await testLogger!.flush();
expect(events).toHaveLength(1);
expect(events[0].ok).toBe(true);
The third approach is great for asserting that the observability layer does what you expect.

See also

Engine primitives

verify, classify, clean used directly

ContractLogger hooks

Assert on specific lifecycle events