GuideApril 7, 2026

Bounded Agent Blueprint: Lead‑Qual, Weekly Reporting, and Inbox Triage — SOPs, Schemas, Evals, KPIs

Operator‑grade playbook to stand up three bounded agents that make money: copy‑ready SOPs, strict JSON Schemas, acceptance tests, n8n/Make maps, a golden‑set eval harness, and a KPI + pricing model. Built for nomad founders who want reliable, low‑maintenance revenue units.

From EpisodeAI Agents That Actually Make Money: The Bounded Automation Playbook

Contents↓ Download PDF

How to use this blueprint Blueprint 1 — Lead Qualification Agent (WhatsApp/Web/Email)Blueprint 2 — Weekly Client Reporting Agent (Slack + HTML Email)Blueprint 3 — Inbox Triage Agent (Classification → Routing → Draft)Lightweight evaluation harness (golden set + nightly gates)Observability + Incident SOP (retries, backoff, escalation)KPI dashboard + pricing calculator (copy the math)Implementation checklist + Lisbon Test

Build three revenue‑relevant, bounded agents you can run from anywhere: Lead Qualification, Weekly Reporting, and Inbox Triage. This guide includes copy‑ready SOPs, JSON Schemas, acceptance tests, n8n/Make maps, a lightweight evaluation harness, and a KPI + pricing model so you can ship one agent in 14 days and keep it stable on café Wi‑Fi and async hours.

How to use this blueprint

Use this like a Notion playbook. Clone the sections you need, paste the schemas/prompts into your stack, and follow the 14‑day sprint.

14‑Day ship plan:

Days 1–2: Pick one agent and define its single job + inputs/outputs. Write the ICP and success criteria.
Days 3–4: Wire the connectors (CRM/Databox/Inbox) and drop in the JSON Schemas + prompts.
Days 5–6: Build a 50‑row golden set (5–10 edge cases included). Save as CSV.
Days 7–8: Add acceptance tests + eval harness. Schedule nightly runs.
Days 9–10: Add error handling: retries/backoff + global error workflow + alerts. Mask PII in logs.
Day 11: Soft‑launch behind human‑in‑the‑loop (100% review of outputs).
Day 12: Drop to 20% spot‑check on low‑risk cases; route sensitive cases to human queue.
Days 13–14: Track KPIs (parse rate, FP rate, FRT, cost/run). If pass, package price + SLOs and ship to first client.

Lisbon Test (pass/fail):

Survives shaky Wi‑Fi (async job design + retries).
Handles auth expiry and 429s without human rescue (re‑auth + backoff paths).
Produces strict JSON/HTML the first time (schemas + validators).
Escalates edge cases to a human with context attached (HITL + incident SOP).

↑ Back to top

Blueprint 1 — Lead Qualification Agent (WhatsApp/Web/Email)

Single job: qualify inbound interest using a 3–5 question ruleset, score fit, and either auto‑book or hand off. Channel examples: WhatsApp, email reply, web form, chat widget.

Recommended stack:

Transport: WhatsApp Business API (Twilio/WATI) or web form/email.
Orchestration: n8n or Make.com.
LLM: any model with JSON Schema support or strong JSON adherence.
Calendar/CRM: Google Calendar/Calendly + HubSpot/Pipedrive/Airtable.
Observability: Langfuse (events + eval traces) + Slack alerts.

n8n/Make module map:

Trigger: Webhook (form) or WhatsApp message received.
Normalize: Transform to input JSON per schema.
LLM: Ask only the allowed questions if missing; otherwise score.
Decision: If confidence ≥ 0.75 and next_step=book_call → create calendar event + send link; else route to nurture/disqualify or human queue.
Log: Post result + payload to Langfuse; emit metrics.
Alerts: Confidence < 0.5 or PII/sensitive → Slack channel + assign owner.

Input JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/lead-qual-input.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;name&quot;, &quot;contact&quot;, &quot;source&quot;, &quot;answers&quot;, &quot;icp&quot;],
  &quot;properties&quot;: {
    &quot;name&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;minLength&quot;: 1},
    &quot;contact&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;required&quot;: [&quot;channel&quot;],
      &quot;properties&quot;: {
        &quot;channel&quot;: {&quot;enum&quot;: [&quot;email&quot;, &quot;whatsapp&quot;, &quot;sms&quot;, &quot;chat&quot;, &quot;webform&quot;]},
        &quot;email&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;format&quot;: &quot;email&quot;},
        &quot;phone&quot;: {&quot;type&quot;: &quot;string&quot;},
        &quot;handle&quot;: {&quot;type&quot;: &quot;string&quot;}
      }
    },
    &quot;source&quot;: {&quot;type&quot;: &quot;string&quot;},
    &quot;answers&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;additionalProperties&quot;: {&quot;type&quot;: [&quot;string&quot;, &quot;number&quot;, &quot;boolean&quot;]}},
    &quot;icp&quot;: {&quot;type&quot;: &quot;object&quot;, &quot;description&quot;: &quot;Ideal customer profile summary used for scoring.&quot;,
      &quot;properties&quot;: {
        &quot;industry&quot;: {&quot;type&quot;: &quot;string&quot;},
        &quot;company_size_min&quot;: {&quot;type&quot;: &quot;integer&quot;},
        &quot;company_size_max&quot;: {&quot;type&quot;: &quot;integer&quot;},
        &quot;geo&quot;: {&quot;type&quot;: &quot;string&quot;}
      },
      &quot;required&quot;: [&quot;industry&quot;]
    }
  },
  &quot;additionalProperties&quot;: false
}

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/lead-qual-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;score&quot;, &quot;reasons&quot;, &quot;next_step&quot;, &quot;confidence&quot;],
  &quot;properties&quot;: {
    &quot;status&quot;: {&quot;enum&quot;: [&quot;ok&quot;, &quot;need_more_info&quot;]},
    &quot;missing&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}},
    &quot;score&quot;: {&quot;type&quot;: &quot;integer&quot;, &quot;minimum&quot;: 0, &quot;maximum&quot;: 100},
    &quot;reasons&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}, &quot;minItems&quot;: 1},
    &quot;next_step&quot;: {&quot;enum&quot;: [&quot;book_call&quot;, &quot;nurture&quot;, &quot;disqualify&quot;]},
    &quot;confidence&quot;: {&quot;type&quot;: &quot;number&quot;, &quot;minimum&quot;: 0, &quot;maximum&quot;: 1},
    &quot;booking_link&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;format&quot;: &quot;uri&quot;}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

You are a lead qualification assistant. Return STRICT JSON matching lead-qual-output.v1.json. 
Use only data in lead-qual-input.v1.json. If required answers are missing, set status:&quot;need_more_info&quot; and list missing. 
Do not invent data. If next_step is &quot;book_call&quot;, confidence must be ≥ 0.75.

Acceptance tests (ship with these):

Parse rate: 100 valid JSON outputs over 100 test runs on golden set.
Business rule: Auto‑fail if next_step=book_call AND confidence < 0.75.
Safety: Redact email/phone in logs; no PII in Slack messages.
QA: Randomly sample 10% of “book_call” for human review daily.

Escalation matrix:

Confidence < 0.5 → Human review queue.
“High deal value” keywords or competitor mentions → Tag owner and escalate immediately.
Three consecutive failures in 10 minutes (same node) → open incident, notify on‑call, pause auto‑booking.

Core KPIs:

Contact→Qualified% (7‑day rolling), Auto‑book rate, False‑positive rate (human reversals), First‑response time (FRT).

↑ Back to top

Blueprint 2 — Weekly Client Reporting Agent (Slack + HTML Email)

Single job: pull weekly KPIs from trusted sources, validate math, summarize to Slack and an HTML email. No dashboards to click, no hallucinated numbers.

Recommended stack:

Data: Databox MCP (to GA4/Ads/CRM) or native connectors.
Orchestration: n8n (CRON Monday 09:00 local) or Make scheduler.
LLM: model capable of strict JSON + HTML generation.
Output: Slack channel + HTML email to client list.

n8n/Make module map:

Trigger: Schedule.
Fetch: GA4/Ads/CRM metrics via Databox MCP.
Validate: Check totals = sum(children); percentages in [0,100]. If missing data, set value="unknown".
LLM: Produce JSON with {slack[], html}.
HTML validator: fail fast on malformed tags/styles; if fail → human QA + retry.
Send: Slack + email. Log artifacts + costs to Langfuse.

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/weekly-report-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;slack&quot;, &quot;html&quot;],
  &quot;properties&quot;: {
    &quot;slack&quot;: {
      &quot;type&quot;: &quot;array&quot;,
      &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;maxLength&quot;: 140},
      &quot;minItems&quot;: 3, &quot;maxItems&quot;: 12
    },
    &quot;html&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;minLength&quot;: 200}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

Return STRICT JSON per weekly-report-output.v1.json. 
- Slack: ≤12 bullet lines, each &lt; 140 chars. 
- HTML: table-based layout, inline CSS only, no external assets. If any metric source is missing, write &quot;unknown&quot; and explain briefly at the bottom. 
Validate numeric sums; if inconsistency, emit a &quot;consistency_note&quot; in an HTML footer.

Acceptance tests:

HTML passes validator; no unclosed tags.
Numeric checks recompute within ±0.1% tolerance.
If any dataset is missing, HTML includes an “unknown” callout and no fabricated values.
Slack array contains 3–12 lines; each < 140 characters.

Core KPIs:

On‑time delivery (% Mondays by 09:15), Parse/validation pass rate, Client replies within 24h, Support tickets opened from report (should be low).

↑ Back to top

Blueprint 3 — Inbox Triage Agent (Classification → Routing → Draft)

Single job: classify incoming messages, set priority, and route to the right owner/folder. Drafting replies is optional and should require human send.

Recommended stack:

Inbox: Gmail/Outlook or Helpdesk (Help Scout/Zendesk).
Orchestration: n8n/Make with per‑module error handling.
LLM: classification‑focused with deterministic labels.
Storage: Airtable/DB table for message log + outcomes.

Allowed labels: {support, sales, billing, spam, personal} Sensitive intents: {legal, refund, escalation}

Input JSON (expect):

{
  &quot;message_id&quot;: &quot;str&quot;,
  &quot;thread_id&quot;: &quot;str&quot;,
  &quot;subject&quot;: &quot;str&quot;,
  &quot;from&quot;: &quot;email&quot;,
  &quot;body&quot;: &quot;str&quot;,
  &quot;attachments&quot;: [{&quot;filename&quot;: &quot;str&quot;, &quot;mime&quot;: &quot;str&quot;}],
  &quot;received_at&quot;: &quot;ISO-8601&quot;
}

Output JSON Schema (v1):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;$id&quot;: &quot;https://statelessfounder.com/schemas/inbox-triage-output.v1.json&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;label&quot;, &quot;priority&quot;, &quot;reason&quot;, &quot;sensitive&quot;],
  &quot;properties&quot;: {
    &quot;label&quot;: {&quot;enum&quot;: [&quot;support&quot;, &quot;sales&quot;, &quot;billing&quot;, &quot;spam&quot;, &quot;personal&quot;]},
    &quot;priority&quot;: {&quot;type&quot;: &quot;integer&quot;, &quot;minimum&quot;: 1, &quot;maximum&quot;: 5},
    &quot;reason&quot;: {&quot;type&quot;: &quot;string&quot;},
    &quot;sensitive&quot;: {&quot;type&quot;: &quot;boolean&quot;},
    &quot;route_to&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;description&quot;: &quot;Mailbox/folder/owner ID&quot;}
  },
  &quot;additionalProperties&quot;: false
}

Prompt scaffold (system):

Classify the message into one allowed label. Return STRICT JSON per inbox-triage-output.v1.json. 
Never write to customers. If refund/legal/escalation is detected, set sensitive=true and raise priority (1=highest).

Acceptance tests:

Labels limited to the allowed set; any other value auto‑fails.
PII masking in logs (email/phone regex) before storage/alerts.
Sensitive=true always routes to human queue; never send auto‑replies.
Duplicates: same thread_id within 10 minutes → dedupe and skip.

Core KPIs:

First response time (FRT), Deflection rate (auto‑resolved without human), Sensitive‑case catch rate, Manual review hit rate.

↑ Back to top

Lightweight evaluation harness (golden set + nightly gates)

Own your reliability. The eval harness keeps you from shipping regressions and gives you numbers for clients.

Golden set (per agent):

Size: 50 rows minimum (10 edge cases: missing fields, OOO auto‑replies, malformed HTML, rate‑limit simulation, duplicate thread, high‑value exceptions).
Format: CSV with columns: input_json, expected_output_json, notes, tags.

Sample CSV (3 rows shown):

input_json,expected_output_json,notes,tags
&quot;{\&quot;name\&quot;:\&quot;Ada\&quot;,\&quot;contact\&quot;:{\&quot;channel\&quot;:\&quot;webform\&quot;,\&quot;email\&quot;:\&quot;ada@example.com\&quot;},\&quot;source\&quot;:\&quot;LP\&quot;,\&quot;answers\&quot;:{\&quot;budget\&quot;:\&quot;2k\&quot;},\&quot;icp\&quot;:{\&quot;industry\&quot;:\&quot;coaching\&quot;}}&quot;,&quot;{\&quot;status\&quot;:\&quot;need_more_info\&quot;,\&quot;missing\&quot;:[\&quot;timeline\&quot;],\&quot;score\&quot;:0,\&quot;reasons\&quot;:[\&quot;Missing timeline\&quot;],\&quot;next_step\&quot;:\&quot;nurture\&quot;,\&quot;confidence\&quot;:0.4}&quot;,&quot;Lead-qual missing timeline&quot;,&quot;edge,lead-qual&quot;
&quot;{\&quot;slack\&quot;:[],\&quot;html\&quot;:\&quot;&lt;html&gt;&lt;body&gt;&lt;table&gt;&lt;tr&gt;&lt;td&gt;T&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/body&gt;&lt;/html&gt;\&quot;}&quot;,&quot;SCHEMA_FAIL&quot;,&quot;Reporting with empty Slack block&quot;,&quot;edge,reporting&quot;
&quot;{\&quot;message_id\&quot;:\&quot;1\&quot;,\&quot;thread_id\&quot;:\&quot;A\&quot;,\&quot;subject\&quot;:\&quot;Refund\&quot;,\&quot;from\&quot;:\&quot;c@example.com\&quot;,\&quot;body\&quot;:\&quot;I want a refund\&quot;}&quot;,&quot;{\&quot;label\&quot;:\&quot;billing\&quot;,\&quot;priority\&quot;:1,\&quot;reason\&quot;:\&quot;Refund keyword\&quot;,\&quot;sensitive\&quot;:true,\&quot;route_to\&quot;:\&quot;billing_queue\&quot;}&quot;,&quot;Inbox refund path&quot;,&quot;edge,inbox&quot;

Metrics to track:

Schema parse rate = valid_outputs / total_runs
Business rule pass rate = passes_business_rules / total_runs
False‑positive rate (agent booked call but human reversed) = fp / total_book_calls
Deflection rate (triage only) = auto_resolved / total_tickets
Manual review hit rate = sent_to_human / total_runs
Latency p95 and cost/run (sum of LLM + platform costs)

Harness setup (Langfuse/OpenAI‑evals friendly):

Store golden rows as dataset. Nightly job runs agent on dataset; compare outputs; upload traces + costs.
Gate deploys: if parse rate or business rule pass rate drops by >2% vs last green, auto‑block and alert.
Weekly: sample 5 outputs per agent for human rubric scoring (1–5) and trend.

Command sketch (pseudocode):

make eval AGENT=lead-qual DATA=golden/lead_qual.csv OUT=reports/lead_qual_$(date).json

Governance defaults:

Keep 90 days of eval artifacts.
Tag runs by model+prompt version.
Treat model changes like code changes (review + rollback path).

↑ Back to top

Observability + Incident SOP (retries, backoff, escalation)

Most failures aren’t “AI problems.” They’re ops: timeouts, 429s, auth expiry, brittle HTML. Put guardrails in writing.

Global error workflow (copy‑ready SOP):

Detection: Route all n8n/Make errors to a global Error Workflow. Capture: workflow_id, node, message, execution_url, retryOf.
Classify:
- HTTP 5xx/timeout/429 → Retry path.
- 4xx auth (401/403) → Re‑auth path.
- Schema/validation fail → Human QA path.
Retry: 3 attempts with exponential backoff (1m, 5m, 15m). Jitter recommended.
Escalation: Still failing after retries or confidence < threshold → open ticket, notify on‑call (Slack/PagerDuty). Attach execution_url and a redacted payload snippet.
Post‑incident: Create a postmortem card. Add a regression case to the golden set. Tag the run with a new incident ID.

Operational safeguards:

PII: Mask emails/phones in logs and alerts.
Rate limits: Centralize backoff settings; never parallelize beyond provider quotas.
Auth: Monitor token age; auto‑rotate credentials with least privilege; alert before expiry.
Zapier caveat: Keep trigger work under 30s; push long calls to async callbacks/queues.

Expected outcomes:

Incidents resolved within 30 minutes during business hours.
<1 false‑positive booking per 100 auto‑booked calls.
Weekly reporting failure rate <2% (auto‑retry recovers most).

↑ Back to top

KPI dashboard + pricing calculator (copy the math)

Track the numbers that pay the bills and price with a margin, not vibes.

KPI sheet (minimum set):

Parse rate (schema): valid_json / total_runs
Business rule pass rate: passes_rules / total_runs
False‑positive rate (lead‑qual): fp / total_book_calls
FRT (inbox/lead‑qual): p50 and p95 in minutes
On‑time delivery (reporting): deliveries_before_09_15 / total_weeks
Cost/run: llm_cost + platform_cost + infra_cost

Pricing calculator (logic): Inputs:

setup_hours
hourly_rate
platform_monthly (n8n/Make/Zapier tier)
avg_runs_per_month
llm_cost_per_run (tokens × price/1k + vector/db/query costs)
maintenance_hours_per_month
target_gross_margin (e.g., 0.6)

Formulas:

setup_fee = setup_hours × hourly_rate + one‑time tooling (if any)
monthly_direct_cost = platform_monthly + (avg_runs_per_month × llm_cost_per_run) + (maintenance_hours_per_month × hourly_rate)
recommended_monthly_price = ceil(monthly_direct_cost / (1 - target_gross_margin))
anchor_ranges (market‑checked): entry $99–$299/mo per unit; mid €500–€3,000/mo; high‑touch $2,500–$5,000+/mo. Use your cost model to pick a lane.

Packaging notes:

Productize by agent: one setup + one monthly per agent. Offer a bundle discount for 2–3 agents.
SLOs in proposal: parse rate ≥ 98%, FP rate ≤ 1%, reporting on Mondays by 09:15, incident response < 2 hours during support window.
Change management: 90‑day review to tune thresholds and cut human review from 20% → 10% once stable.

↑ Back to top

Implementation checklist + Lisbon Test

Run this before you turn on fully unattended mode.

Go‑live checklist:

One job per agent; JSON Schemas validated and versioned.
Golden set (≥50) loaded; nightly eval passing; deploy gate active.
Retries/backoff implemented globally; rate‑limit quotas set.
Auth rotation tested; failure alerts include re‑auth instructions.
PII masking verified end‑to‑end (logs, alerts, data store).
Human‑in‑the‑loop paths wired; sensitive cases always human‑owned.
HTML validation (reporting) green for two consecutive Mondays.
KPIs visible to client (shared dashboard or weekly footer block).
Incident SOP rehearsed once with a forced failure.
Lisbon Test: pull laptop power and Wi‑Fi for 10 minutes mid‑run — does it recover without you?

Minimal SLO card to paste in your client doc:

Lead‑qual: ≥98% parse rate; FP ≤1%; median FRT ≤5 min.
Reporting: On‑time Mondays by 09:15; validation pass ≥98%.
Inbox: Sensitive‑case catch rate ≥95%; deflection rate target agreed per client.

What to iterate next:

Reduce manual review from 20% → 10% once KPIs hold for 4 weeks.
Add canary checks (1 synthetic message/report per day) to detect silent failures.
Graduate autonomy only after three green eval cycles post‑change (model or prompt).

↑ Back to top