Engine Primitives

defineContract and enforce wrap a pipeline of five primitives. Each is exported individually so you can build custom flows — testing, manual prompt engineering, or entirely custom retry strategies.

import {
  clean,
  verify,
  classify,
  repair,
  instructions,
} from "@withboundary/contract";

All five are pure, synchronous, and side-effect free.

`clean(raw)`

function clean(raw: string | null | undefined): unknown;

Normalizes raw LLM output into a parsed JSON value. Handles:

Stripping code fences ( json … )
De-prosing (removing leading “Here’s the JSON:” chatter)
Finding the first valid JSON object or array
Returning null when nothing parseable is found

clean("```json\n{\"score\": 85}\n```");
// → { score: 85 }

clean("Here's your answer: {\"score\": 85}. Hope this helps!");
// → { score: 85 }

clean("I can't answer that.");
// → null

`verify(data, schema, rules?)`

function verify<T>(
  data: unknown,
  schema: ContractSchema<T>,
  rules?: Rule<T>[],
): ContractResult<T>;

Validates data against a schema and rules. No LLM involved — pure sync validation. Returns the same ContractResult<T> shape that contract.accept returns.

const schema = z.object({ tier: z.enum(["hot", "warm", "cold"]), score: z.number() });
const rules = [
  {
    name: "hot_requires_high_score",
    description: "Hot leads must have a score of at least 70",
    check: (lead) => lead.tier !== "hot" || lead.score >= 70,
  },
];

const result = verify({ tier: "hot", score: 25 }, schema, rules);
// { ok: false, error: { message: "...", attempts: [...] } }

const ok = verify({ tier: "cold", score: 25 }, schema, rules);
// { ok: true, data: { tier: "cold", score: 25 }, attempts: 1, raw: "...", durationMS: 0 }

Perfect for unit tests — no mocking, no network.

`classify(raw, cleaned)`

function classify(raw: string, cleaned: unknown): FailureCategory;

Given the raw LLM output and the result of clean(raw), return the failure category:

classify("", null);                         // → "EMPTY_RESPONSE"
classify("I can't help with that.", null);  // → "REFUSAL"
classify("some prose, no json", null);      // → "NO_JSON"
classify("{\"a\": 1,", undefined);          // → "TRUNCATED" or "PARSE_ERROR"
classify("{\"a\": 1}", { a: 1 });           // → "VALIDATION_ERROR" (caller decides)

Useful when you’re building your own repair loop and want the same categorization as Boundary’s built-in loop.

`repair(detail, overrides?)`

function repair(
  detail: AttemptDetail,
  overrides?: Partial<Record<FailureCategory, RepairFn | false>>,
): Message[] | false;

Given a failed attempt detail, generate repair messages. Returns false if the category is explicitly disabled via overrides.

const messages = repair({
  raw: "{\"tier\": \"hot\", \"score\": 25}",
  cleaned: { tier: "hot", score: 25 },
  issues: ["hot leads require score > 70"],
  category: "RULE_ERROR",
});
// → [{ role: "user", content: "...the specific violations..." }]

`instructions(schema)`

function instructions(schema: ContractSchema<unknown>): string;

Generate prompt instructions derived from a Zod schema — the same text that contract.accept auto-injects into attempt.instructions.

const schema = z.object({
  tier: z.enum(["hot", "warm", "cold"]),
  score: z.number().min(0).max(100).describe("0 means no signal, 100 is strong intent"),
});

console.log(instructions(schema));
// → Return JSON matching:
//   {
//     "tier": one of "hot" | "warm" | "cold",
//     "score": number (0-100) — 0 means no signal, 100 is strong intent
//   }

Call it once and paste the output into your system prompt if you want to control timing (e.g. to hit a prompt cache). The primitive does not take a public options object. If you want extra text appended to the instructions used by contract.accept, set instructions.suffix on defineContract.

Recipes

Schema-first unit tests

Test that your rules behave correctly without running an LLM:

import { describe, it, expect } from "vitest";
import { verify } from "@withboundary/contract";

describe("leadContract", () => {
  it("rejects hot tier with low score", () => {
    const result = verify({ tier: "hot", score: 25 }, schema, rules);
    expect(result.ok).toBe(false);
    expect(result.error.attempts[0].category).toBe("RULE_ERROR");
  });

  it("accepts cold tier with low score", () => {
    const result = verify({ tier: "cold", score: 25 }, schema, rules);
    expect(result.ok).toBe(true);
  });
});

Manual prompt, no loop

Skip contract.accept’s loop and drive the LLM yourself when you need full control (token budgeting, streaming, custom retry policy):

import { clean, verify, classify, repair, instructions } from "@withboundary/contract";

const systemPrompt = `You are a lead-scoring assistant.\n${instructions(schema)}`;

async function runWithCustomLoop(userPrompt: string) {
  let messages: Message[] = [
    { role: "system", content: systemPrompt },
    { role: "user", content: userPrompt },
  ];

  for (let attempt = 1; attempt <= 5; attempt++) {
    const raw = await callYourLLM(messages);
    const cleaned = clean(raw);
    const result = verify(cleaned, schema, rules);

    if (result.ok) return result;

    const detail = {
      raw,
      cleaned,
      issues: result.error.attempts[0].issues,
      category: classify(raw, cleaned),
    };
    const repairMessages = repair(detail);
    if (repairMessages === false) break;

    messages = [...messages, { role: "assistant", content: raw }, ...repairMessages];
  }
  throw new Error("all attempts failed");
}

Classify a failure from another system

Got a raw LLM output and a validation error from somewhere else? Use classify to bucket it the same way Boundary would:

function onLegacyLLMFailure(raw: string, parsed: unknown) {
  const category = classify(raw, parsed);
  metrics.increment("llm.failure", { category });
}

enforce vs defineContract

Which entrypoint to pick

Testing contracts

No LLM, no network, deterministic tests

​clean(raw)

​verify(data, schema, rules?)

​classify(raw, cleaned)

​repair(detail, overrides?)

​instructions(schema)

​Recipes

​Schema-first unit tests

​Manual prompt, no loop

​Classify a failure from another system

​See also