The Stateless Founder

← All Resources

TemplateMay 8, 2026

AI‑First SOP (Notion) — Versioned Prompts + 30‑Day Change‑Control

A vendor‑agnostic Notion template that turns your prompts into a maintainable SOP: versioned steps, pinned model tags, guardrails, monitoring, a rollback plan, and a 30‑day change‑control loop—plus three filled mini‑examples (automation, content, support).

Contents↓ Download PDF

1) Page header — canonical metadata 2) Preconditions (green‑light check)3) Inputs — strict contracts (schemas + examples)4) Steps — with versioned prompts 5) Expected outputs — schemas + quality gates 6) Known failure modes (and mitigations)7) Guardrails — what’s enforced where 8) Monitoring hooks — traces, evals, alerts 9) Rollback plan — hotfix path 10) Revision log — single source of truth 11) Naming + versioning convention (SemVer + model tag + dataset hash)12) Pure‑Notion setup — properties, relations, buttons 13) Standing 30‑day change‑control — runbook 14) Mini‑example — Automation (invoice categorization)15) Mini‑example — Content (podcast → LinkedIn post)16) Mini‑example — Support (refund triage)

Duplicate this page for every workflow that uses an LLM. Treat your prompts like code: pin a model+version, use SemVer for changes, add guardrails, monitor runs, and keep a rolling 30‑day review so provider deprecations don’t break delivery. Fill in [BRACKETS] and replace example text with your own. Keep this page tidy—contractors should be able to ship safely in under 10 minutes.

1) Page header — canonical metadata

Paste this block at the top of every SOP page.

SOP Title: [SOP NAME]
Owner: [OWNER NAME]
Backup Owner: [BACKUP OWNER]
Status: [DRAFT | APPROVED | DEPRECATED]
SemVer: [MAJOR.MINOR.PATCH] (e.g., 1.3.2)
Model Tag: [PROVIDER:MODEL:VERSION] (e.g., openai:gpt-4o:2026-01-20)
Temperature Band: [LOW (0.0–0.2) | MED (0.3–0.6) | HIGH (0.7–1.0)]
Dataset/Test Hash: [DATASET_SHA or N/A]
Last Updated: [YYYY‑MM‑DD]
Next Review (30‑day): [YYYY‑MM‑DD]
Prompt Registry ID(s): [PROMPT_KEY(S)]
Guardrails Policy ID+Version: [POLICY_ID@v]
Monitoring — Trace: [TRACE_DASHBOARD_URL]
Monitoring — Evals: [EVAL_DASHBOARD_URL]
Cost Target: [$MAX_PER_RUN] | Latency Target: [MS]
Rollback Target Version: [vMAJOR.MINOR.PATCH]
Links: [DATASET LINK] · [CHANGE‑LOG DB] · [RUNBOOK]

Tip: In Notion, add database properties for each line. Use a Formula for Next Review: dateAdd(prop("Last edited time"), 30, "days").

↑ Back to top

2) Preconditions (green‑light check)

Complete before any run.

Access: [TOOLS/ACCOUNTS NEEDED] (e.g., API keys, doc repo, CRM view)
Data scope: [WHAT DATA CAN THE MODEL SEE?]
Ground truth: [LINKS TO POLICIES / STYLE GUIDES / KB]
Privacy: [PII HANDLED? YES/NO] · Anonymization: [ON/OFF]
Guardrails attached: [POLICY_ID@v] (e.g., input/output filters, tool allow‑list)
Test fixtures loaded: [YES/NO] · Dataset hash: [HASH]
Failure budget: [MAX % ERROR or MAX ESCALATIONS]
Approval path: [REVIEWER] for [RISKY STEPS]
Rollback ready: [LAST GOOD VERSION] restored? [YES/NO]

↑ Back to top

3) Inputs — strict contracts (schemas + examples)

Define exactly what the model receives. Keep shapes stable across versions.

Input Type: [NAME]
Contract (JSON Schema):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;title&quot;: &quot;[INPUT_NAME]&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;[FIELD_1]&quot;, &quot;[FIELD_2]&quot;],
  &quot;properties&quot;: {
    &quot;[FIELD_1]&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;description&quot;: &quot;[DESC]&quot;},
    &quot;[FIELD_2]&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: {&quot;type&quot;: &quot;string&quot;}},
    &quot;[OPTIONAL]&quot;: {&quot;type&quot;: &quot;number&quot;, &quot;minimum&quot;: 0}
  }
}

Example payload:

{&quot;[FIELD_1]&quot;:&quot;[VALUE]&quot;,&quot;[FIELD_2]&quot;:[&quot;a&quot;,&quot;b&quot;],&quot;[OPTIONAL]&quot;:3}

↑ Back to top

4) Steps — with versioned prompts

Document every model call as a step with versioned prompts. If a step changes output shape or safety guarantees, bump MAJOR. If the step changes instructions without breaking outputs, bump MINOR. If only wording/typos/thresholds tweak outcomes, bump PATCH.

Step Template (copy for each step):

Step #: [N] — [STEP NAME]
Prompt Key: [PROMPT_KEY]
Current Version: v[MAJOR.MINOR.PATCH]
Model Tag: [PROVIDER:MODEL:VERSION]
Parameters: temperature=[X], top_p=[X], max_tokens=[X]
System Message:

[CONCISE ROLE + RULES]
- Output MUST be valid JSON matching the provided schema.
- Refuse if input contains [DISALLOWED] or asks for [SENSITIVE_ACTION].
- Use source‑of‑truth links: [KB_LINKS].

User Template:

Task: [TASK]
Inputs (JSON):
```json
[INPUT_PAYLOAD]

Quality gates: [BULLETS]

- Expected Output Schema: [LINK TO SECTION 5]
- Validators: [REGEX/SCHEMA/EVAL RULES]
- Guardrails attached: [POLICY_ID@v] (e.g., PII scrub, tool allow‑list)
- Observability: Trace tag(s) = [SOP_NAME], [PROMPT_KEY], [MODEL_TAG]
- Fallback: [RULES WHEN STEP FAILS]
- Tests to run before publish: [N FIXTURES] passing ≥[THRESHOLD]%

↑ Back to top

5) Expected outputs — schemas + quality gates

State the exact shape of successful outputs and how you’ll check them.

Output Type: [NAME]
Contract (JSON Schema):

{
  &quot;$schema&quot;: &quot;https://json-schema.org/draft/2020-12/schema&quot;,
  &quot;title&quot;: &quot;[OUTPUT_NAME]&quot;,
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;status&quot;, &quot;data&quot;],
  &quot;properties&quot;: {
    &quot;status&quot;: {&quot;type&quot;: &quot;string&quot;, &quot;enum&quot;: [&quot;ok&quot;, &quot;needs_review&quot;, &quot;refused&quot;]},
    &quot;data&quot;: {&quot;type&quot;: &quot;object&quot;},
    &quot;notes&quot;: {&quot;type&quot;: &quot;string&quot;}
  }
}

Acceptance checks:
- JSON validates against schema
- Forbidden phrases not present: [LIST]
- Score on eval suite ≥ [THRESHOLD]
- Latency ≤ [MS] and cost ≤ [$]
Human spot‑check sample size: [N] items per [M] days

↑ Back to top

6) Known failure modes (and mitigations)

List the predictable ways this can go wrong and how you’ll catch them.

Injection/Goal drift → Mitigation: strong system message; strip/neutralize untrusted input; attach guardrails [POLICY_ID@v]
Data leakage (PII/keys) → Mitigation: PII scrub rule; output filter; test fixtures with red‑team prompts
Hallucinated policy → Mitigation: cite‑required with KB link; eval that fails on missing citations
Tool misuse (unsafe actions) → Mitigation: tool allow‑list; require human approval for [ACTIONS]
JSON breakage → Mitigation: response_format=json; repair parser with retry and schema validation
Model churn regressions → Mitigation: pin [MODEL_TAG], monthly review, back‑compat fixtures

↑ Back to top

7) Guardrails — what’s enforced where

Attach and reference guardrails you operate.

Input filters: [PII, secrets, URLs from unknown domains]
Output filters: [PII removal, profanity, unsupported claims]
Policy map: [DOC LINK] → [RULE NAME] → [ENFORCEMENT POINT]
Tool allow‑list: [TOOLS AVAILABLE TO THE AGENT]
Human‑in‑the‑loop: [WHEN] → [WHO APPROVES]
Runtime IDs: Guardrail Policy [POLICY_ID@v] (record here)

Note: If you switch stacks later (e.g., different guardrail vendor), keep the same policy names and versions in this SOP so diffs stay readable.

↑ Back to top

8) Monitoring hooks — traces, evals, alerts

Make regressions easy to see and cheap to fix.

Tracing: tag every call with [SOP_NAME], [PROMPT_KEY], [MODEL_TAG]
Evals: [WEEKLY | NIGHTLY] run on [N] fixtures; track pass‑rate trend
Alerts: error rate > [X%], eval drop > [Y pts], latency > [Z ms], cost/run > [$]
Dashboards: [TRACE_DASHBOARD_URL] · [EVAL_DASHBOARD_URL]
Sampling: human review [N] items per [M] days; rotate reviewers across time zones
Post‑incident note: use the Revision Log entry template (Section 10)

↑ Back to top

9) Rollback plan — hotfix path

When something breaks or a provider change lands, do this in order.

Freeze — set Status: [DEGRADED]. Route outputs to [SAFE MODE/HUMAN REVIEW].
Identify — compare failing traces against last good [MODEL_TAG] + [PROMPT_VERSION].
Roll back — set [PROMPT_KEY] to [ROLLBACK TARGET]. Verify on fixtures (≥[THRESHOLD]% pass). Set Status: [STABLE].
Patch — create v[PATCH+1] with minimal fix. Rerun evals. If output shape changed, bump MINOR/MAJOR and update Section 5.
Document — add a Revision Log entry (Section 10). Link traces/evals.
Prevent — add a new fixture capturing the failure. Update failure modes if needed.

↑ Back to top

10) Revision log — single source of truth

Keep one row per change. If you use a Notion DB, make these columns.

Date: [YYYY‑MM‑DD]
Editor: [NAME]
Prompt Key: [PROMPT_KEY]
From → To: [vOLD] → [vNEW]
Change Type: [MAJOR | MINOR | PATCH]
Reason/Trigger: [BUG | PROVIDER DEPRECATION | POLICY UPDATE | COST]
Tests Run: [N]/[TOTAL], Pass‑rate: [%]
Model Tag: [PROVIDER:MODEL:VERSION]
Links: [DIFF] · [TRACE VIEW] · [EVAL RUN]
Rollback Plan Tested: [YES/NO]
Notes: [WHAT WE LEARNED]

Example entry:

2026‑05‑08 | Kira | support_refund_triage | 1.2.3→1.2.4 | PATCH | JSON repair retry | 48/48 pass | openai:gpt‑4o:2026‑01‑20 | diff/123 · trace/q=refund · eval/456 | rollback tested | tightened refusal rule for out‑of‑scope asks

↑ Back to top

11) Naming + versioning convention (SemVer + model tag + dataset hash)

Use this everywhere names appear (page title, Prompt Key, datasets, commits).

Prompt Key: [TEAM]_[DOMAIN]_[TASK] (lowercase, snake_case). Example: cs_refund_triage.
Version: MAJOR.MINOR.PATCH
- MAJOR: output shape/contract or guardrail guarantees change
- MINOR: instructions/constraints change; contract stable
- PATCH: small fixes; no behavioral intent change
Model Tag: [PROVIDER]:[MODEL]:[RELEASE_DATE] (from vendor docs)
Dataset/Test Hash: first 8 of dataset SHA‑256 (e.g., ds:7fa91b2c)
Full label (recommended): [PROMPT_KEY]@v[SEMVER]#[MODEL_TAG]+[DATASET_HASH]
- Example: cs_refund_triage@v1.2.4#openai:gpt‑4o:2026‑01‑20+ds:7fa91b2c

Rule: Never change text without bumping a version. If you change the model, update the Model Tag and rerun evals before publish.

↑ Back to top

12) Pure‑Notion setup — properties, relations, buttons

A lightweight, vendor‑agnostic path for ≤3 collaborators.

Create a Notion database “AI SOPs”. Add properties:
- Owner (Person), Status (Select), SemVer (Text), Model Tag (Text), Dataset Hash (Text), Last edited time (Last edited time), Next Review (Formula: dateAdd(prop("Last edited time"), 30, "days")), Guardrails ID (Text), Trace URL (URL), Eval URL (URL)
Add a “Revision Log” database with the columns from Section 10.
Relation: link each SOP to its Revision Log entries.
Template: add this entire page as the DB’s default template.
Button (optional): “Publish new version” → duplicates the page, increments SemVer manually, creates a Revision Log stub with today’s date.
Page History: use Notion’s version history for visual diffs when needed.

Tip: For larger teams, mirror Prompt Key, Version, and Model Tag in your code or automation layer so traces are searchable across tools.

↑ Back to top

13) Standing 30‑day change‑control — runbook

Run this every 30 days or when a provider posts a deprecation/changelog item.

Watchlist (record links in the SOP DB): [OPENAI DEPRECATIONS] · [AZURE OPENAI MODEL RETIREMENTS] · [ANTHROPIC MODEL DEPRECATIONS] · [VERTEX AI DEPRECATIONS] · [OPENAI CHANGELOG]
Calendar: recurring “LLM Change‑Review” event → Owner
Before the review:
- Export current pass‑rate, latency, and cost/run
- Check deprecations for your Model Tag(s)
- If a change affects you, create a migration plan and a target date
During the review:
- Re‑run evals on [N] fixtures
- Spot‑check [M] live outputs
- Decide: [KEEP AS‑IS] | [PATCH/MINOR] | [MAJOR MIGRATION]
After the review:
- Update Model Tag if needed; run full tests
- Add a Revision Log entry and update Last Updated (auto)
- Confirm Next Review date populated by Formula

↑ Back to top

14) Mini‑example — Automation (invoice categorization)

A filled example you can adapt. Goal: classify invoices into categories and flag outliers.

Header quickfill:

SOP Title: Finance — Invoice Categorization
Owner: [OWNER]
SemVer: 1.0.0 · Model Tag: [PROVIDER:MODEL:VERSION]
Temperature Band: LOW (0.1) · Dataset Hash: ds:1a2b3c4d

Inputs schema:

{
  &quot;type&quot;: &quot;object&quot;,
  &quot;required&quot;: [&quot;vendor&quot;,&quot;line_items&quot;,&quot;total&quot;,&quot;currency&quot;],
  &quot;properties&quot;: {
    &quot;vendor&quot;: {&quot;type&quot;:&quot;string&quot;},
    &quot;line_items&quot;: {&quot;type&quot;:&quot;array&quot;,&quot;items&quot;:{&quot;type&quot;:&quot;object&quot;,&quot;required&quot;:[&quot;desc&quot;,&quot;amount&quot;],&quot;properties&quot;:{&quot;desc&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;amount&quot;:{&quot;type&quot;:&quot;number&quot;}}}},
    &quot;total&quot;: {&quot;type&quot;:&quot;number&quot;},
    &quot;currency&quot;: {&quot;type&quot;:&quot;string&quot;,&quot;minLength&quot;:3,&quot;maxLength&quot;:3}
  }
}

Step 1 — fin_invoice_categorize v1.0.0

System:

You are a strict accounting classifier. Map each line item to one of the allowed categories. Output valid JSON only.
Allowed categories: [&quot;software&quot;,&quot;contractor&quot;,&quot;travel&quot;,&quot;office&quot;,&quot;other&quot;].
Refuse if currency not provided.

User:

Task: Categorize invoice line items.
Inputs (JSON):
```json
{&quot;vendor&quot;:&quot;Acme SaaS&quot;,&quot;line_items&quot;:[{&quot;desc&quot;:&quot;Pro plan&quot;,&quot;amount&quot;:49}],&quot;total&quot;:49,&quot;currency&quot;:&quot;USD&quot;}

Quality gates: must sum to total; categories must be from allow‑list.

- Output schema (excerpt):
```json
{&quot;status&quot;:&quot;ok&quot;,&quot;data&quot;:{&quot;categories&quot;:[{&quot;desc&quot;:&quot;Pro plan&quot;,&quot;category&quot;:&quot;software&quot;,&quot;amount&quot;:49}]},&quot;notes&quot;:&quot;&quot;}

Validators: sum(line_items.amount) == total; categories in allow‑list
Monitoring: Trace tag fin_invoice_categorize@v1.0.0#[MODEL_TAG]
Failure modes: currency missing → refuse; unknown vendor → still categorize

↑ Back to top

15) Mini‑example — Content (podcast → LinkedIn post)

Turn a transcript into a concise LinkedIn post with a safety gate for claims.

Header quickfill:

SOP Title: Content — Episode → LinkedIn Post
Owner: [OWNER]
SemVer: 1.1.0 · Model Tag: [PROVIDER:MODEL:VERSION]
Temperature Band: MED (0.5) · Dataset Hash: ds:9e8d7c6b

Inputs schema (excerpt):

{&quot;type&quot;:&quot;object&quot;,&quot;required&quot;:[&quot;title&quot;,&quot;transcript&quot;,&quot;url&quot;],&quot;properties&quot;:{&quot;title&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;transcript&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;url&quot;:{&quot;type&quot;:&quot;string&quot;}}}

Step 1 — content_linkedin_post v1.1.0

System:

You write plainspoken operator posts. 120–180 words. No emojis. Include a single numeric insight if present. Link at the end.
If a sentence includes an unverified claim, add &quot;(source in comments)&quot;.

User:

Task: Draft a LinkedIn post from this episode.
Inputs (JSON): {&quot;title&quot;:&quot;[EP TITLE]&quot;,&quot;transcript&quot;:&quot;[RAW TEXT]&quot;,&quot;url&quot;:&quot;[LP URL]&quot;}
Quality gates: 1 idea, 1 number, clear CTA to template link.

Output schema (excerpt):

{&quot;status&quot;:&quot;ok&quot;,&quot;data&quot;:{&quot;post&quot;:&quot;[TEXT]&quot;,&quot;cta&quot;:&quot;[URL]&quot;}}

Guardrails: profanity filter; URL allow‑list
Monitoring: sample 5 posts/month for human edit time < 3 minutes

↑ Back to top

16) Mini‑example — Support (refund triage)

Draft a safe first reply and route edge cases to a human.

Header quickfill:

SOP Title: Support — Refund Triage
Owner: [OWNER]
SemVer: 1.2.4 · Model Tag: [PROVIDER:MODEL:VERSION]
Temperature Band: LOW (0.2) · Dataset Hash: ds:7fa91b2c

Inputs schema (excerpt):

{&quot;type&quot;:&quot;object&quot;,&quot;required&quot;:[&quot;message&quot;,&quot;order_id&quot;,&quot;customer_tier&quot;],&quot;properties&quot;:{&quot;message&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;order_id&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;customer_tier&quot;:{&quot;type&quot;:&quot;string&quot;,&quot;enum&quot;:[&quot;free&quot;,&quot;pro&quot;,&quot;enterprise&quot;]}}}

Step 1 — cs_refund_triage v1.2.4

System:

You are a policy‑bound support agent. Obey the refund policy. If request is outside policy or order is older than [DAYS], set status to &quot;needs_review&quot; and explain why.

User:

Task: Draft a first reply following policy.
Inputs (JSON): {&quot;message&quot;:&quot;[TEXT]&quot;,&quot;order_id&quot;:&quot;[ID]&quot;,&quot;customer_tier&quot;:&quot;pro&quot;}
Quality gates: policy link cited; tone neutral; no offers beyond policy.

Output schema (excerpt):

{&quot;status&quot;:&quot;ok&quot;,&quot;data&quot;:{&quot;reply&quot;:&quot;[TEXT]&quot;,&quot;policy_links&quot;:[&quot;[URL]&quot;]},&quot;notes&quot;:&quot;route to human if needs_review&quot;}

Guardrails: PII scrub; forbidden offers list; refusal template
Fallback: set status "needs_review" if policy ambiguous

↑ Back to top