Skip to main content

The silent failure

Your pipeline extracts provider and billing data from healthcare claim documents:
{
  "npi": "12345",
  "providerName": "Dr. Sarah Chen",
  "patientDob": "1990-13-01",
  "procedureCode": "99213",
  "diagnosisCode": "J06.9",
  "amount": 150.00,
  "lineItems": [
    { "code": "99213", "amount": 85.00 },
    { "code": "36415", "amount": 65.00 }
  ]
}
Valid JSON. Schema passes. Your claims system accepts it. But 12345 is not a valid NPI. National Provider Identifiers are exactly 10 digits and must pass the Luhn check digit algorithm. The patient DOB has month 13, which doesn’t exist. Your system ingested bad data that will be rejected downstream by the payer — costing time, money, and a denied claim.

With Boundary

Attempt 1 → { npi: "12345", patientDob: "1990-13-01", amount: 150 }
             ✕ NPI must be exactly 10 digits
             ✕ patientDob month 13 is invalid
             ↻ violations sent back to model

Attempt 2 → { npi: "1234567890", patientDob: "1990-03-01", amount: 150 }
             ✔ all rules pass
             ✔ ACCEPTED — safe for claims processing
The model corrected the NPI to a valid 10-digit format and fixed the impossible date. Your claims system only saw clean data.

The contract

import { z } from "zod";
import { defineContract } from "@withboundary/contract";

const lineItemSchema = z.object({
  code: z.string(),
  amount: z.number().positive(),
});

const schema = z.object({
  npi: z.string(),
  providerName: z.string(),
  patientDob: z.string(),
  procedureCode: z.string(),
  diagnosisCode: z.string(),
  amount: z.number().positive(),
  lineItems: z.array(lineItemSchema).min(1),
});

const claimContract = defineContract({
  schema,
  rules: [
    // NPI must be exactly 10 digits
    (claim) => /^\d{10}$/.test(claim.npi)
      || `NPI "${claim.npi}" is not a valid 10-digit National Provider Identifier`,

    // diagnosis code must match ICD-10 format (letter + digits, optional dot)
    (claim) => /^[A-Z]\d{2}(\.\d{1,4})?$/.test(claim.diagnosisCode)
      || `diagnosisCode "${claim.diagnosisCode}" does not match ICD-10 format`,

    // procedure code must be 5 digits (CPT format)
    (claim) => /^\d{5}$/.test(claim.procedureCode)
      || `procedureCode "${claim.procedureCode}" is not a valid 5-digit CPT code`,

    // date of birth must be a valid date
    (claim) => {
      const [y, m, d] = claim.patientDob.split("-").map(Number);
      if (m < 1 || m > 12) return `patientDob month ${m} is invalid`;
      if (d < 1 || d > 31) return `patientDob day ${d} is invalid`;
      if (y > new Date().getFullYear()) return "patientDob is in the future";
      return true;
    },

    // line items must sum to the total amount
    (claim) => {
      const lineTotal = claim.lineItems.reduce((sum, li) => sum + li.amount, 0);
      return Math.abs(lineTotal - claim.amount) < 0.01
        || `line items total $${lineTotal.toFixed(2)} != claim amount $${claim.amount.toFixed(2)}`;
    },

    // procedure code must appear in line items
    (claim) => claim.lineItems.some((li) => li.code === claim.procedureCode)
      || `procedureCode "${claim.procedureCode}" not found in line items`,
  ],
});

Full example

const result = await claimContract.accept(async (attempt) => {
  const res = await anthropic.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    system: attempt.instructions,
    messages: [
      {
        role: "user",
        content: `Extract claim data from this medical document:\n\n${pdfText}`,
      },
      ...attempt.repairs,
    ],
  });
  return res.content[0].text;
});

if (result.ok) {
  // ✔ safe for claims submission
  await claimsSystem.submit({
    npi: result.data.npi,
    providerName: result.data.providerName,
    procedureCode: result.data.procedureCode,
    diagnosisCode: result.data.diagnosisCode,
    amount: result.data.amount,
    lineItems: result.data.lineItems,
  });
}

Result: safe for claims processingEvery extracted claim is validated for identifier format (10-digit NPI, ICD-10 diagnosis codes, 5-digit CPT codes), date validity, and cross-field consistency (line items sum to total). Malformed data never reaches your payer.

Why regex rules matter for healthcare extraction

Schema validation tells you a field is a string. Rules tell you it’s the right string.
// Schema says: npi is a string ✓
// Rule says: npi is exactly 10 digits ✓

// Schema says: diagnosisCode is a string ✓
// Rule says: diagnosisCode matches ICD-10 format (e.g., J06.9) ✓

// Schema says: procedureCode is a string ✓
// Rule says: procedureCode is a valid 5-digit CPT code ✓
LLMs frequently return plausible-looking but incorrectly formatted identifiers. A 5-digit NPI looks reasonable. An ICD-10 code without the letter prefix looks like a number. Rules catch these before they enter your system and cause claim denials.

When to use this pattern

  • Medical claims extraction and validation
  • Insurance document parsing
  • Provider credentialing data extraction
  • Patient record structured data extraction
  • Lab report parsing with LOINC code validation
  • Any healthcare pipeline with regulated identifier formats