Guide

Bounded Agent Blueprint: Lead‑Qual, Weekly Reporting, and Inbox Triage — SOPs, Schemas, Evals, KPIs

Operator‑grade playbook to stand up three bounded agents that make money: copy‑ready SOPs, strict JSON Schemas, acceptance tests, n8n/Make maps, a golden‑set eval harness, and a KPI + pricing model. Built for nomad founders who want reliable, low‑maintenance revenue units.

Build three revenue‑relevant, bounded agents you can run from anywhere: Lead Qualification, Weekly Reporting, and Inbox Triage. This guide includes copy‑ready SOPs, JSON Schemas, acceptance tests, n8n/Make maps, a lightweight evaluation harness, and a KPI + pricing model so you can ship one agent in 14 days and keep it stable on café Wi‑Fi and async hours.

How to use this blueprint

Use this like a Notion playbook. Clone the sections you need, paste the schemas/prompts into your stack, and follow the 14‑day sprint.

14‑Day ship plan:

  • Days 1–2: Pick one agent and define its single job + inputs/outputs. Write the ICP and success criteria.
  • Days 3–4: Wire the connectors (CRM/Databox/Inbox) and drop in the JSON Schemas + prompts.
  • Days 5–6: Build a 50‑row golden set (5–10 edge cases included). Save as CSV.
  • Days 7–8: Add acceptance tests + eval harness. Schedule nightly runs.
  • Days 9–10: Add error handling: retries/backoff + global error workflow + alerts. Mask PII in logs.
  • Day 11: Soft‑launch behind human‑in‑the‑loop (100% review of outputs).
  • Day 12: Drop to 20% spot‑check on low‑risk cases; route sensitive cases to human queue.
  • Days 13–14: Track KPIs (parse rate, FP rate, FRT, cost/run). If pass, package price + SLOs and ship to first client.

Lisbon Test (pass/fail):

  • Survives shaky Wi‑Fi (async job design + retries).
  • Handles auth expiry and 429s without human rescue (re‑auth + backoff paths).
  • Produces strict JSON/HTML the first time (schemas + validators).
  • Escalates edge cases to a human with context attached (HITL + incident SOP).

Blueprint 1 — Lead Qualification Agent (WhatsApp/Web/Email)

Single job: qualify inbound interest using a 3–5 question ruleset, score fit, and either auto‑book or hand off. Channel examples: WhatsApp, email reply, web form, chat widget.

Recommended stack:

  • Transport: WhatsApp Business API (Twilio/WATI) or web form/email.
  • Orchestration: n8n or Make.com.
  • LLM: any model with JSON Schema support or strong JSON adherence.
  • Calendar/CRM: Google Calendar/Calendly + HubSpot/Pipedrive/Airtable.
  • Observability: Langfuse (events + eval traces) + Slack alerts.

n8n/Make module map:

  1. Trigger: Webhook (form) or WhatsApp message received.
  2. Normalize: Transform to input JSON per schema.
  3. LLM: Ask only the allowed questions if missing; otherwise score.
  4. Decision: If confidence ≥ 0.75 and next_step=book_call → create calendar event + send link; else route to nurture/disqualify or human queue.
  5. Log: Post result + payload to Langfuse; emit metrics.
  6. Alerts: Confidence < 0.5 or PII/sensitive → Slack channel + assign owner.

Input JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/lead-qual-input.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;name&quot;, &quot;contact&quot;, &quot;source&quot;, &quot;answers&quot;, &quot;icp&quot;],
  &quot;properties&quot;: {
    &quot;name&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;minLength&quot;: 1},
    &quot;contact&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;required&quot;: [&quot;channel&quot;],
      &quot;properties&quot;: {
        &quot;channel&quot;: {&quot;enum&quot;: [&quot;email&quot;, &quot;whatsapp&quot;, &quot;sms&quot;, &quot;chat&quot;, &quot;webform&quot;]},
        &quot;email&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;format&quot;: &quot;email&quot;},
        &quot;phone&quot;: {&quot;type&quot;: &quot;string&quot;},
        &quot;handle&quot;: {&quot;type&quot;: &quot;string&quot;}
      }
    },
    &quot;source&quot;: {&quot;type&quot;: &quot;string&quot;},
    &quot;answers&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;additionalProperties&quot;: {&quot;type&quot;: [&quot;string&quot;, &quot;number&quot;, &quot;boolean&quot;]}},
    &quot;icp&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;description&quot;: &quot;Ideal customer profile summary used for scoring.&quot;,
      &quot;properties&quot;: {
        &quot;industry&quot;: {&quot;type&quot;: &quot;string&quot;},
        &quot;company_size_min&quot;: {&quot;type&quot;: &quot;integer&quot;},
        &quot;company_size_max&quot;: {&quot;type&quot;: &quot;integer&quot;},
        &quot;geo&quot;: {&quot;type&quot;: &quot;string&quot;}
      },
      &quot;required&quot;: [&quot;industry&quot;]
    }
  },
  &quot;additionalProperties&quot;: false
}

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/lead-qual-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;score&quot;, &quot;reasons&quot;, &quot;next_step&quot;, &quot;confidence&quot;],
  &quot;properties&quot;: {
    &quot;status&quot;: {&quot;enum&quot;: [&quot;ok&quot;, &quot;need_more_info&quot;]},
    &quot;missing&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}},
    &quot;score&quot;: {&quot;type&quot;: &quot;integer&quot;, &quot;minimum&quot;: 0, &quot;maximum&quot;: 100},
    &quot;reasons&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}, &quot;minItems&quot;: 1},
    &quot;next_step&quot;: {&quot;enum&quot;: [&quot;book_call&quot;, &quot;nurture&quot;, &quot;disqualify&quot;]},
    &quot;confidence&quot;: {&quot;type&quot;: &quot;number&quot;, &quot;minimum&quot;: 0, &quot;maximum&quot;: 1},
    &quot;booking_link&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;format&quot;: &quot;uri&quot;}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

You are a lead qualification assistant. Return STRICT JSON matching lead-qual-output.v1.json. 
Use only data in lead-qual-input.v1.json. If required answers are missing, set status:&quot;need_more_info&quot; and list missing. 
Do not invent data. If next_step is &quot;book_call&quot;, confidence must be ≥ 0.75.

Acceptance tests (ship with these):

  • Parse rate: 100 valid JSON outputs over 100 test runs on golden set.
  • Business rule: Auto‑fail if next_step=book_call AND confidence < 0.75.
  • Safety: Redact email/phone in logs; no PII in Slack messages.
  • QA: Randomly sample 10% of “book_call” for human review daily.

Escalation matrix:

  • Confidence < 0.5 → Human review queue.
  • “High deal value” keywords or competitor mentions → Tag owner and escalate immediately.
  • Three consecutive failures in 10 minutes (same node) → open incident, notify on‑call, pause auto‑booking.

Core KPIs:

  • Contact→Qualified% (7‑day rolling), Auto‑book rate, False‑positive rate (human reversals), First‑response time (FRT).

Blueprint 2 — Weekly Client Reporting Agent (Slack + HTML Email)

Single job: pull weekly KPIs from trusted sources, validate math, summarize to Slack and an HTML email. No dashboards to click, no hallucinated numbers.

Recommended stack:

  • Data: Databox MCP (to GA4/Ads/CRM) or native connectors.
  • Orchestration: n8n (CRON Monday 09:00 local) or Make scheduler.
  • LLM: model capable of strict JSON + HTML generation.
  • Output: Slack channel + HTML email to client list.

n8n/Make module map:

  1. Trigger: Schedule.
  2. Fetch: GA4/Ads/CRM metrics via Databox MCP.
  3. Validate: Check totals = sum(children); percentages in [0,100]. If missing data, set value="unknown".
  4. LLM: Produce JSON with {slack[], html}.
  5. HTML validator: fail fast on malformed tags/styles; if fail → human QA + retry.
  6. Send: Slack + email. Log artifacts + costs to Langfuse.

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/weekly-report-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;slack&quot;, &quot;html&quot;],
  &quot;properties&quot;: {
    &quot;slack&quot;: {
      &quot;type&quot;: &quot;array&quot;,
      &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;maxLength&quot;: 140},
      &quot;minItems&quot;: 3, &quot;maxItems&quot;: 12
    },
    &quot;html&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;minLength&quot;: 200}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

Return STRICT JSON per weekly-report-output.v1.json. 
- Slack: ≤12 bullet lines, each &lt; 140 chars. 
- HTML: table-based layout, inline CSS only, no external assets. If any metric source is missing, write &quot;unknown&quot; and explain briefly at the bottom. 
Validate numeric sums; if inconsistency, emit a &quot;consistency_note&quot; in an HTML footer.

Acceptance tests:

  • HTML passes validator; no unclosed tags.
  • Numeric checks recompute within ±0.1% tolerance.
  • If any dataset is missing, HTML includes an “unknown” callout and no fabricated values.
  • Slack array contains 3–12 lines; each < 140 characters.

Core KPIs:

  • On‑time delivery (% Mondays by 09:15), Parse/validation pass rate, Client replies within 24h, Support tickets opened from report (should be low).

Blueprint 3 — Inbox Triage Agent (Classification → Routing → Draft)

Single job: classify incoming messages, set priority, and route to the right owner/folder. Drafting replies is optional and should require human send.

Recommended stack:

  • Inbox: Gmail/Outlook or Helpdesk (Help Scout/Zendesk).
  • Orchestration: n8n/Make with per‑module error handling.
  • LLM: classification‑focused with deterministic labels.
  • Storage: Airtable/DB table for message log + outcomes.

Allowed labels: {support, sales, billing, spam, personal} Sensitive intents: {legal, refund, escalation}

Input JSON (expect):

{
  &quot;message_id&quot;: &quot;str&quot;,
  &quot;thread_id&quot;: &quot;str&quot;,
  &quot;subject&quot;: &quot;str&quot;,
  &quot;from&quot;: &quot;email&quot;,
  &quot;body&quot;: &quot;str&quot;,
  &quot;attachments&quot;: [{&quot;filename&quot;: &quot;str&quot;, &quot;mime&quot;: &quot;str&quot;}],
  &quot;received_at&quot;: &quot;ISO-8601&quot;
}

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/inbox-triage-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;label&quot;, &quot;priority&quot;, &quot;reason&quot;, &quot;sensitive&quot;],
  &quot;properties&quot;: {
    &quot;label&quot;: {&quot;enum&quot;: [&quot;support&quot;, &quot;sales&quot;, &quot;billing&quot;, &quot;spam&quot;, &quot;personal&quot;]},
    &quot;priority&quot;: {&quot;type&quot;: &quot;integer&quot;, &quot;minimum&quot;: 1, &quot;maximum&quot;: 5},
    &quot;reason&quot;: {&quot;type&quot;: &quot;string&quot;},
    &quot;sensitive&quot;: {&quot;type&quot;: &quot;boolean&quot;},
    &quot;route_to&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;description&quot;: &quot;Mailbox/folder/owner ID&quot;}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

Classify the message into one allowed label. Return STRICT JSON per inbox-triage-output.v1.json. 
Never write to customers. If refund/legal/escalation is detected, set sensitive=true and raise priority (1=highest).

Acceptance tests:

  • Labels limited to the allowed set; any other value auto‑fails.
  • PII masking in logs (email/phone regex) before storage/alerts.
  • Sensitive=true always routes to human queue; never send auto‑replies.
  • Duplicates: same thread_id within 10 minutes → dedupe and skip.

Core KPIs:

  • First response time (FRT), Deflection rate (auto‑resolved without human), Sensitive‑case catch rate, Manual review hit rate.

Lightweight evaluation harness (golden set + nightly gates)

Own your reliability. The eval harness keeps you from shipping regressions and gives you numbers for clients.

Golden set (per agent):

  • Size: 50 rows minimum (10 edge cases: missing fields, OOO auto‑replies, malformed HTML, rate‑limit simulation, duplicate thread, high‑value exceptions).
  • Format: CSV with columns: input_json, expected_output_json, notes, tags.

Sample CSV (3 rows shown):

input_json,expected_output_json,notes,tags
&quot;{\&quot;name\&quot;:\&quot;Ada\&quot;,\&quot;contact\&quot;:{\&quot;channel\&quot;:\&quot;webform\&quot;,\&quot;email\&quot;:\&quot;ada@example.com\&quot;},\&quot;source\&quot;:\&quot;LP\&quot;,\&quot;answers\&quot;:{\&quot;budget\&quot;:\&quot;2k\&quot;},\&quot;icp\&quot;:{\&quot;industry\&quot;:\&quot;coaching\&quot;}}&quot;,&quot;{\&quot;status\&quot;:\&quot;need_more_info\&quot;,\&quot;missing\&quot;:[\&quot;timeline\&quot;],\&quot;score\&quot;:0,\&quot;reasons\&quot;:[\&quot;Missing timeline\&quot;],\&quot;next_step\&quot;:\&quot;nurture\&quot;,\&quot;confidence\&quot;:0.4}&quot;,&quot;Lead-qual missing timeline&quot;,&quot;edge,lead-qual&quot;
&quot;{\&quot;slack\&quot;:[],\&quot;html\&quot;:\&quot;&lt;html&gt;&lt;body&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;T&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/body&gt;&lt;/html&gt;\&quot;}&quot;,&quot;SCHEMA_FAIL&quot;,&quot;Reporting with empty Slack block&quot;,&quot;edge,reporting&quot;
&quot;{\&quot;message_id\&quot;:\&quot;1\&quot;,\&quot;thread_id\&quot;:\&quot;A\&quot;,\&quot;subject\&quot;:\&quot;Refund\&quot;,\&quot;from\&quot;:\&quot;c@example.com\&quot;,\&quot;body\&quot;:\&quot;I want a refund\&quot;}&quot;,&quot;{\&quot;label\&quot;:\&quot;billing\&quot;,\&quot;priority\&quot;:1,\&quot;reason\&quot;:\&quot;Refund keyword\&quot;,\&quot;sensitive\&quot;:true,\&quot;route_to\&quot;:\&quot;billing_queue\&quot;}&quot;,&quot;Inbox refund path&quot;,&quot;edge,inbox&quot;

Metrics to track:

  • Schema parse rate = valid_outputs / total_runs
  • Business rule pass rate = passes_business_rules / total_runs
  • False‑positive rate (agent booked call but human reversed) = fp / total_book_calls
  • Deflection rate (triage only) = auto_resolved / total_tickets
  • Manual review hit rate = sent_to_human / total_runs
  • Latency p95 and cost/run (sum of LLM + platform costs)

Harness setup (Langfuse/OpenAI‑evals friendly):

  • Store golden rows as dataset. Nightly job runs agent on dataset; compare outputs; upload traces + costs.
  • Gate deploys: if parse rate or business rule pass rate drops by >2% vs last green, auto‑block and alert.
  • Weekly: sample 5 outputs per agent for human rubric scoring (1–5) and trend.

Command sketch (pseudocode):

make eval AGENT=lead-qual DATA=golden/lead_qual.csv OUT=reports/lead_qual_$(date).json

Governance defaults:

  • Keep 90 days of eval artifacts.
  • Tag runs by model+prompt version.
  • Treat model changes like code changes (review + rollback path).

Observability + Incident SOP (retries, backoff, escalation)

Most failures aren’t “AI problems.” They’re ops: timeouts, 429s, auth expiry, brittle HTML. Put guardrails in writing.

Global error workflow (copy‑ready SOP):

  1. Detection: Route all n8n/Make errors to a global Error Workflow. Capture: workflow_id, node, message, execution_url, retryOf.
  2. Classify:
    • HTTP 5xx/timeout/429 → Retry path.
    • 4xx auth (401/403) → Re‑auth path.
    • Schema/validation fail → Human QA path.
  3. Retry: 3 attempts with exponential backoff (1m, 5m, 15m). Jitter recommended.
  4. Escalation: Still failing after retries or confidence < threshold → open ticket, notify on‑call (Slack/PagerDuty). Attach execution_url and a redacted payload snippet.
  5. Post‑incident: Create a postmortem card. Add a regression case to the golden set. Tag the run with a new incident ID.

Operational safeguards:

  • PII: Mask emails/phones in logs and alerts.
  • Rate limits: Centralize backoff settings; never parallelize beyond provider quotas.
  • Auth: Monitor token age; auto‑rotate credentials with least privilege; alert before expiry.
  • Zapier caveat: Keep trigger work under 30s; push long calls to async callbacks/queues.

Expected outcomes:

  • Incidents resolved within 30 minutes during business hours.
  • <1 false‑positive booking per 100 auto‑booked calls.
  • Weekly reporting failure rate <2% (auto‑retry recovers most).

KPI dashboard + pricing calculator (copy the math)

Track the numbers that pay the bills and price with a margin, not vibes.

KPI sheet (minimum set):

  • Parse rate (schema): valid_json / total_runs
  • Business rule pass rate: passes_rules / total_runs
  • False‑positive rate (lead‑qual): fp / total_book_calls
  • FRT (inbox/lead‑qual): p50 and p95 in minutes
  • On‑time delivery (reporting): deliveries_before_09_15 / total_weeks
  • Cost/run: llm_cost + platform_cost + infra_cost

Pricing calculator (logic): Inputs:

  • setup_hours
  • hourly_rate
  • platform_monthly (n8n/Make/Zapier tier)
  • avg_runs_per_month
  • llm_cost_per_run (tokens × price/1k + vector/db/query costs)
  • maintenance_hours_per_month
  • target_gross_margin (e.g., 0.6)

Formulas:

  • setup_fee = setup_hours × hourly_rate + one‑time tooling (if any)
  • monthly_direct_cost = platform_monthly + (avg_runs_per_month × llm_cost_per_run) + (maintenance_hours_per_month × hourly_rate)
  • recommended_monthly_price = ceil(monthly_direct_cost / (1 - target_gross_margin))
  • anchor_ranges (market‑checked): entry $99–$299/mo per unit; mid €500–€3,000/mo; high‑touch $2,500–$5,000+/mo. Use your cost model to pick a lane.

Packaging notes:

  • Productize by agent: one setup + one monthly per agent. Offer a bundle discount for 2–3 agents.
  • SLOs in proposal: parse rate ≥ 98%, FP rate ≤ 1%, reporting on Mondays by 09:15, incident response < 2 hours during support window.
  • Change management: 90‑day review to tune thresholds and cut human review from 20% → 10% once stable.

Implementation checklist + Lisbon Test

Run this before you turn on fully unattended mode.

Go‑live checklist:

  1. One job per agent; JSON Schemas validated and versioned.
  2. Golden set (≥50) loaded; nightly eval passing; deploy gate active.
  3. Retries/backoff implemented globally; rate‑limit quotas set.
  4. Auth rotation tested; failure alerts include re‑auth instructions.
  5. PII masking verified end‑to‑end (logs, alerts, data store).
  6. Human‑in‑the‑loop paths wired; sensitive cases always human‑owned.
  7. HTML validation (reporting) green for two consecutive Mondays.
  8. KPIs visible to client (shared dashboard or weekly footer block).
  9. Incident SOP rehearsed once with a forced failure.
  10. Lisbon Test: pull laptop power and Wi‑Fi for 10 minutes mid‑run — does it recover without you?

Minimal SLO card to paste in your client doc:

  • Lead‑qual: ≥98% parse rate; FP ≤1%; median FRT ≤5 min.
  • Reporting: On‑time Mondays by 09:15; validation pass ≥98%.
  • Inbox: Sensitive‑case catch rate ≥95%; deflection rate target agreed per client.

What to iterate next:

  • Reduce manual review from 20% → 10% once KPIs hold for 4 weeks.
  • Add canary checks (1 synthetic message/report per day) to detect silent failures.
  • Graduate autonomy only after three green eval cycles post‑change (model or prompt).