Skip to main content
defineContract and enforce wrap a pipeline of five primitives. Each is exported individually so you can build custom flows — testing, manual prompt engineering, or entirely custom retry strategies.
import {
  clean,
  verify,
  classify,
  repair,
  instructions,
} from "@withboundary/contract";
All five are pure, synchronous, and side-effect free.

clean(raw)

function clean(raw: string | null): unknown;
Normalizes raw LLM output into a parsed JSON value. Handles:
  • Stripping code fences ( json … )
  • De-prosing (removing leading “Here’s the JSON:” chatter)
  • Finding the first valid JSON object or array
  • Returning null when nothing parseable is found
clean("```json\n{\"score\": 85}\n```");
// → { score: 85 }

clean("Here's your answer: {\"score\": 85}. Hope this helps!");
// → { score: 85 }

clean("I can't answer that.");
// → null

verify(data, schema, rules?)

function verify<T>(
  data: unknown,
  schema: ZodType<T>,
  rules?: Rule<T>[],
): ContractResult<T>;
Validates data against a schema and rules. No LLM involved — pure sync validation. Returns the same ContractResult<T> shape that contract.accept returns.
const schema = z.object({ tier: z.enum(["hot", "warm", "cold"]), score: z.number() });
const rules = [(d) => d.tier !== "hot" || d.score > 70];

const result = verify({ tier: "hot", score: 25 }, schema, rules);
// { ok: false, error: { message: "...", attempts: [...] } }

const ok = verify({ tier: "cold", score: 25 }, schema, rules);
// { ok: true, data: { tier: "cold", score: 25 }, attempts: 1, raw: "...", durationMS: 0 }
Perfect for unit tests — no mocking, no network.

classify(raw, cleaned)

function classify(raw: string, cleaned: unknown): FailureCategory;
Given the raw LLM output and the result of clean(raw), return the failure category:
classify("", null);                         // → "EMPTY_RESPONSE"
classify("I can't help with that.", null);  // → "REFUSAL"
classify("some prose, no json", null);      // → "NO_JSON"
classify("{\"a\": 1,", undefined);          // → "TRUNCATED" or "PARSE_ERROR"
classify("{\"a\": 1}", { a: 1 });           // → "VALIDATION_ERROR" (caller decides)
Useful when you’re building your own repair loop and want the same categorization as Boundary’s built-in loop.

repair(detail, overrides?)

function repair(
  detail: AttemptDetail,
  overrides?: Partial<Record<FailureCategory, RepairFn | false>>,
): Message[] | false;
Given a failed attempt detail, generate repair messages. Returns false if the category is explicitly disabled via overrides.
const messages = repair({
  raw: "{\"tier\": \"hot\", \"score\": 25}",
  cleaned: { tier: "hot", score: 25 },
  issues: ["hot leads require score > 70"],
  category: "RULE_ERROR",
});
// → [{ role: "user", content: "...the specific violations..." }]

instructions(schema, options?)

function instructions<T>(
  schema: ZodType<T>,
  options?: { suffix?: string },
): string;
Generate prompt instructions derived from a Zod schema — the same text that contract.accept auto-injects into attempt.instructions.
const schema = z.object({
  tier: z.enum(["hot", "warm", "cold"]),
  score: z.number().min(0).max(100).describe("0 means no signal, 100 is strong intent"),
});

console.log(instructions(schema));
// → Return JSON matching:
//   {
//     "tier": one of "hot" | "warm" | "cold",
//     "score": number (0-100) — 0 means no signal, 100 is strong intent
//   }
Call it once and paste the output into your system prompt if you want to control timing (e.g. to hit a prompt cache).

Recipes

Schema-first unit tests

Test that your rules behave correctly without running an LLM:
import { describe, it, expect } from "vitest";
import { verify } from "@withboundary/contract";

describe("leadContract", () => {
  it("rejects hot tier with low score", () => {
    const result = verify({ tier: "hot", score: 25 }, schema, rules);
    expect(result.ok).toBe(false);
    expect(result.error.attempts[0].category).toBe("RULE_ERROR");
  });

  it("accepts cold tier with low score", () => {
    const result = verify({ tier: "cold", score: 25 }, schema, rules);
    expect(result.ok).toBe(true);
  });
});

Manual prompt, no loop

Skip contract.accept’s loop and drive the LLM yourself when you need full control (token budgeting, streaming, custom retry policy):
import { clean, verify, classify, repair, instructions } from "@withboundary/contract";

const systemPrompt = `You are a lead-scoring assistant.\n${instructions(schema)}`;

async function runWithCustomLoop(userPrompt: string) {
  let messages: Message[] = [
    { role: "system", content: systemPrompt },
    { role: "user", content: userPrompt },
  ];

  for (let attempt = 1; attempt <= 5; attempt++) {
    const raw = await callYourLLM(messages);
    const cleaned = clean(raw);
    const result = verify(cleaned, schema, rules);

    if (result.ok) return result;

    const detail = {
      raw,
      cleaned,
      issues: result.error.attempts[0].issues,
      category: classify(raw, cleaned),
    };
    const repairMessages = repair(detail);
    if (repairMessages === false) break;

    messages = [...messages, { role: "assistant", content: raw }, ...repairMessages];
  }
  throw new Error("all attempts failed");
}

Classify a failure from another system

Got a raw LLM output and a validation error from somewhere else? Use classify to bucket it the same way Boundary would:
function onLegacyLLMFailure(raw: string, parsed: unknown) {
  const category = classify(raw, parsed);
  metrics.increment("llm.failure", { category });
}

See also

enforce vs defineContract

Which entrypoint to pick

Testing contracts

No LLM, no network, deterministic tests