Template

AI‑First SOP (Notion) — Versioned Prompts + 30‑Day Change‑Control

A vendor‑agnostic Notion template that turns your prompts into a maintainable SOP: versioned steps, pinned model tags, guardrails, monitoring, a rollback plan, and a 30‑day change‑control loop—plus three filled mini‑examples (automation, content, support).

Duplicate this page for every workflow that uses an LLM. Treat your prompts like code: pin a model+version, use SemVer for changes, add guardrails, monitor runs, and keep a rolling 30‑day review so provider deprecations don’t break delivery. Fill in [BRACKETS] and replace example text with your own. Keep this page tidy—contractors should be able to ship safely in under 10 minutes.

1) Page header — canonical metadata

Paste this block at the top of every SOP page.

  • SOP Title: [SOP NAME]
  • Owner: [OWNER NAME]
  • Backup Owner: [BACKUP OWNER]
  • Status: [DRAFT | APPROVED | DEPRECATED]
  • SemVer: [MAJOR.MINOR.PATCH] (e.g., 1.3.2)
  • Model Tag: [PROVIDER:MODEL:VERSION] (e.g., openai:gpt-4o:2026-01-20)
  • Temperature Band: [LOW (0.0–0.2) | MED (0.3–0.6) | HIGH (0.7–1.0)]
  • Dataset/Test Hash: [DATASET_SHA or N/A]
  • Last Updated: [YYYY‑MM‑DD]
  • Next Review (30‑day): [YYYY‑MM‑DD]
  • Prompt Registry ID(s): [PROMPT_KEY(S)]
  • Guardrails Policy ID+Version: [POLICY_ID@v]
  • Monitoring — Trace: [TRACE_DASHBOARD_URL]
  • Monitoring — Evals: [EVAL_DASHBOARD_URL]
  • Cost Target: [$MAX_PER_RUN] | Latency Target: [MS]
  • Rollback Target Version: [vMAJOR.MINOR.PATCH]
  • Links: [DATASET LINK] · [CHANGE‑LOG DB] · [RUNBOOK]

Tip: In Notion, add database properties for each line. Use a Formula for Next Review: dateAdd(prop("Last edited time"), 30, "days").

2) Preconditions (green‑light check)

Complete before any run.

  • Access: [TOOLS/ACCOUNTS NEEDED] (e.g., API keys, doc repo, CRM view)
  • Data scope: [WHAT DATA CAN THE MODEL SEE?]
  • Ground truth: [LINKS TO POLICIES / STYLE GUIDES / KB]
  • Privacy: [PII HANDLED? YES/NO] · Anonymization: [ON/OFF]
  • Guardrails attached: [POLICY_ID@v] (e.g., input/output filters, tool allow‑list)
  • Test fixtures loaded: [YES/NO] · Dataset hash: [HASH]
  • Failure budget: [MAX % ERROR or MAX ESCALATIONS]
  • Approval path: [REVIEWER] for [RISKY STEPS]
  • Rollback ready: [LAST GOOD VERSION] restored? [YES/NO]

3) Inputs — strict contracts (schemas + examples)

Define exactly what the model receives. Keep shapes stable across versions.

  • Input Type: [NAME]
  • Contract (JSON Schema):
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "[INPUT_NAME]",
  "type": "object",
  "required": ["[FIELD_1]", "[FIELD_2]"],
  "properties": {
    "[FIELD_1]": {"type": "string", "description": "[DESC]"},
    "[FIELD_2]": {"type": "array", "items": {"type": "string"}},
    "[OPTIONAL]": {"type": "number", "minimum": 0}
  }
}
  • Example payload:
{"[FIELD_1]":"[VALUE]","[FIELD_2]":["a","b"],"[OPTIONAL]":3}

4) Steps — with versioned prompts

Document every model call as a step with versioned prompts. If a step changes output shape or safety guarantees, bump MAJOR. If the step changes instructions without breaking outputs, bump MINOR. If only wording/typos/thresholds tweak outcomes, bump PATCH.

Step Template (copy for each step):

  • Step #: [N] — [STEP NAME]
  • Prompt Key: [PROMPT_KEY]
  • Current Version: v[MAJOR.MINOR.PATCH]
  • Model Tag: [PROVIDER:MODEL:VERSION]
  • Parameters: temperature=[X], top_p=[X], max_tokens=[X]
  • System Message:
[CONCISE ROLE + RULES]
- Output MUST be valid JSON matching the provided schema.
- Refuse if input contains [DISALLOWED] or asks for [SENSITIVE_ACTION].
- Use source‑of‑truth links: [KB_LINKS].
  • User Template:
Task: [TASK]
Inputs (JSON):
```json
[INPUT_PAYLOAD]

Quality gates: [BULLETS]

- Expected Output Schema: [LINK TO SECTION 5]
- Validators: [REGEX/SCHEMA/EVAL RULES]
- Guardrails attached: [POLICY_ID@v] (e.g., PII scrub, tool allow‑list)
- Observability: Trace tag(s) = [SOP_NAME], [PROMPT_KEY], [MODEL_TAG]
- Fallback: [RULES WHEN STEP FAILS]
- Tests to run before publish: [N FIXTURES] passing ≥[THRESHOLD]%

5) Expected outputs — schemas + quality gates

State the exact shape of successful outputs and how you’ll check them.

  • Output Type: [NAME]
  • Contract (JSON Schema):
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "[OUTPUT_NAME]",
  "type": "object",
  "required": ["status", "data"],
  "properties": {
    "status": {"type": "string", "enum": ["ok", "needs_review", "refused"]},
    "data": {"type": "object"},
    "notes": {"type": "string"}
  }
}
  • Acceptance checks:
    • JSON validates against schema
    • Forbidden phrases not present: [LIST]
    • Score on eval suite ≥ [THRESHOLD]
    • Latency ≤ [MS] and cost ≤ [$]
  • Human spot‑check sample size: [N] items per [M] days

6) Known failure modes (and mitigations)

List the predictable ways this can go wrong and how you’ll catch them.

  • Injection/Goal drift → Mitigation: strong system message; strip/neutralize untrusted input; attach guardrails [POLICY_ID@v]
  • Data leakage (PII/keys) → Mitigation: PII scrub rule; output filter; test fixtures with red‑team prompts
  • Hallucinated policy → Mitigation: cite‑required with KB link; eval that fails on missing citations
  • Tool misuse (unsafe actions) → Mitigation: tool allow‑list; require human approval for [ACTIONS]
  • JSON breakage → Mitigation: response_format=json; repair parser with retry and schema validation
  • Model churn regressions → Mitigation: pin [MODEL_TAG], monthly review, back‑compat fixtures

7) Guardrails — what’s enforced where

Attach and reference guardrails you operate.

  • Input filters: [PII, secrets, URLs from unknown domains]
  • Output filters: [PII removal, profanity, unsupported claims]
  • Policy map: [DOC LINK] → [RULE NAME] → [ENFORCEMENT POINT]
  • Tool allow‑list: [TOOLS AVAILABLE TO THE AGENT]
  • Human‑in‑the‑loop: [WHEN] → [WHO APPROVES]
  • Runtime IDs: Guardrail Policy [POLICY_ID@v] (record here)

Note: If you switch stacks later (e.g., different guardrail vendor), keep the same policy names and versions in this SOP so diffs stay readable.

8) Monitoring hooks — traces, evals, alerts

Make regressions easy to see and cheap to fix.

  • Tracing: tag every call with [SOP_NAME], [PROMPT_KEY], [MODEL_TAG]
  • Evals: [WEEKLY | NIGHTLY] run on [N] fixtures; track pass‑rate trend
  • Alerts: error rate > [X%], eval drop > [Y pts], latency > [Z ms], cost/run > [$]
  • Dashboards: [TRACE_DASHBOARD_URL] · [EVAL_DASHBOARD_URL]
  • Sampling: human review [N] items per [M] days; rotate reviewers across time zones
  • Post‑incident note: use the Revision Log entry template (Section 10)

9) Rollback plan — hotfix path

When something breaks or a provider change lands, do this in order.

  1. Freeze — set Status: [DEGRADED]. Route outputs to [SAFE MODE/HUMAN REVIEW].
  2. Identify — compare failing traces against last good [MODEL_TAG] + [PROMPT_VERSION].
  3. Roll back — set [PROMPT_KEY] to [ROLLBACK TARGET]. Verify on fixtures (≥[THRESHOLD]% pass). Set Status: [STABLE].
  4. Patch — create v[PATCH+1] with minimal fix. Rerun evals. If output shape changed, bump MINOR/MAJOR and update Section 5.
  5. Document — add a Revision Log entry (Section 10). Link traces/evals.
  6. Prevent — add a new fixture capturing the failure. Update failure modes if needed.

10) Revision log — single source of truth

Keep one row per change. If you use a Notion DB, make these columns.

  • Date: [YYYY‑MM‑DD]
  • Editor: [NAME]
  • Prompt Key: [PROMPT_KEY]
  • From → To: [vOLD] → [vNEW]
  • Change Type: [MAJOR | MINOR | PATCH]
  • Reason/Trigger: [BUG | PROVIDER DEPRECATION | POLICY UPDATE | COST]
  • Tests Run: [N]/[TOTAL], Pass‑rate: [%]
  • Model Tag: [PROVIDER:MODEL:VERSION]
  • Links: [DIFF] · [TRACE VIEW] · [EVAL RUN]
  • Rollback Plan Tested: [YES/NO]
  • Notes: [WHAT WE LEARNED]

Example entry:

2026‑05‑08 | Kira | support_refund_triage | 1.2.3→1.2.4 | PATCH | JSON repair retry | 48/48 pass | openai:gpt‑4o:2026‑01‑20 | diff/123 · trace/q=refund · eval/456 | rollback tested | tightened refusal rule for out‑of‑scope asks

11) Naming + versioning convention (SemVer + model tag + dataset hash)

Use this everywhere names appear (page title, Prompt Key, datasets, commits).

  • Prompt Key: [TEAM]_[DOMAIN]_[TASK] (lowercase, snake_case). Example: cs_refund_triage.
  • Version: MAJOR.MINOR.PATCH
    • MAJOR: output shape/contract or guardrail guarantees change
    • MINOR: instructions/constraints change; contract stable
    • PATCH: small fixes; no behavioral intent change
  • Model Tag: [PROVIDER]:[MODEL]:[RELEASE_DATE] (from vendor docs)
  • Dataset/Test Hash: first 8 of dataset SHA‑256 (e.g., ds:7fa91b2c)
  • Full label (recommended): [PROMPT_KEY]@v[SEMVER]#[MODEL_TAG]+[DATASET_HASH]
    • Example: cs_refund_triage@v1.2.4#openai:gpt‑4o:2026‑01‑20+ds:7fa91b2c

Rule: Never change text without bumping a version. If you change the model, update the Model Tag and rerun evals before publish.

12) Pure‑Notion setup — properties, relations, buttons

A lightweight, vendor‑agnostic path for ≤3 collaborators.

  • Create a Notion database “AI SOPs”. Add properties:
    • Owner (Person), Status (Select), SemVer (Text), Model Tag (Text), Dataset Hash (Text), Last edited time (Last edited time), Next Review (Formula: dateAdd(prop("Last edited time"), 30, "days")), Guardrails ID (Text), Trace URL (URL), Eval URL (URL)
  • Add a “Revision Log” database with the columns from Section 10.
  • Relation: link each SOP to its Revision Log entries.
  • Template: add this entire page as the DB’s default template.
  • Button (optional): “Publish new version” → duplicates the page, increments SemVer manually, creates a Revision Log stub with today’s date.
  • Page History: use Notion’s version history for visual diffs when needed.

Tip: For larger teams, mirror Prompt Key, Version, and Model Tag in your code or automation layer so traces are searchable across tools.

13) Standing 30‑day change‑control — runbook

Run this every 30 days or when a provider posts a deprecation/changelog item.

  • Watchlist (record links in the SOP DB): [OPENAI DEPRECATIONS] · [AZURE OPENAI MODEL RETIREMENTS] · [ANTHROPIC MODEL DEPRECATIONS] · [VERTEX AI DEPRECATIONS] · [OPENAI CHANGELOG]
  • Calendar: recurring “LLM Change‑Review” event → Owner
  • Before the review:
    • Export current pass‑rate, latency, and cost/run
    • Check deprecations for your Model Tag(s)
    • If a change affects you, create a migration plan and a target date
  • During the review:
    • Re‑run evals on [N] fixtures
    • Spot‑check [M] live outputs
    • Decide: [KEEP AS‑IS] | [PATCH/MINOR] | [MAJOR MIGRATION]
  • After the review:
    • Update Model Tag if needed; run full tests
    • Add a Revision Log entry and update Last Updated (auto)
    • Confirm Next Review date populated by Formula

14) Mini‑example — Automation (invoice categorization)

A filled example you can adapt. Goal: classify invoices into categories and flag outliers.

Header quickfill:

  • SOP Title: Finance — Invoice Categorization
  • Owner: [OWNER]
  • SemVer: 1.0.0 · Model Tag: [PROVIDER:MODEL:VERSION]
  • Temperature Band: LOW (0.1) · Dataset Hash: ds:1a2b3c4d

Inputs schema:

{
  "type": "object",
  "required": ["vendor","line_items","total","currency"],
  "properties": {
    "vendor": {"type":"string"},
    "line_items": {"type":"array","items":{"type":"object","required":["desc","amount"],"properties":{"desc":{"type":"string"},"amount":{"type":"number"}}}},
    "total": {"type":"number"},
    "currency": {"type":"string","minLength":3,"maxLength":3}
  }
}

Step 1 — fin_invoice_categorize v1.0.0

  • System:
You are a strict accounting classifier. Map each line item to one of the allowed categories. Output valid JSON only.
Allowed categories: ["software","contractor","travel","office","other"].
Refuse if currency not provided.
  • User:
Task: Categorize invoice line items.
Inputs (JSON):
```json
{"vendor":"Acme SaaS","line_items":[{"desc":"Pro plan","amount":49}],"total":49,"currency":"USD"}

Quality gates: must sum to total; categories must be from allow‑list.

- Output schema (excerpt):
```json
{"status":"ok","data":{"categories":[{"desc":"Pro plan","category":"software","amount":49}]},"notes":""}
  • Validators: sum(line_items.amount) == total; categories in allow‑list
  • Monitoring: Trace tag fin_invoice_categorize@v1.0.0#[MODEL_TAG]
  • Failure modes: currency missing → refuse; unknown vendor → still categorize

15) Mini‑example — Content (podcast → LinkedIn post)

Turn a transcript into a concise LinkedIn post with a safety gate for claims.

Header quickfill:

  • SOP Title: Content — Episode → LinkedIn Post
  • Owner: [OWNER]
  • SemVer: 1.1.0 · Model Tag: [PROVIDER:MODEL:VERSION]
  • Temperature Band: MED (0.5) · Dataset Hash: ds:9e8d7c6b

Inputs schema (excerpt):

{"type":"object","required":["title","transcript","url"],"properties":{"title":{"type":"string"},"transcript":{"type":"string"},"url":{"type":"string"}}}

Step 1 — content_linkedin_post v1.1.0

  • System:
You write plainspoken operator posts. 120–180 words. No emojis. Include a single numeric insight if present. Link at the end.
If a sentence includes an unverified claim, add "(source in comments)".
  • User:
Task: Draft a LinkedIn post from this episode.
Inputs (JSON): {"title":"[EP TITLE]","transcript":"[RAW TEXT]","url":"[LP URL]"}
Quality gates: 1 idea, 1 number, clear CTA to template link.
  • Output schema (excerpt):
{"status":"ok","data":{"post":"[TEXT]","cta":"[URL]"}}
  • Guardrails: profanity filter; URL allow‑list
  • Monitoring: sample 5 posts/month for human edit time < 3 minutes

16) Mini‑example — Support (refund triage)

Draft a safe first reply and route edge cases to a human.

Header quickfill:

  • SOP Title: Support — Refund Triage
  • Owner: [OWNER]
  • SemVer: 1.2.4 · Model Tag: [PROVIDER:MODEL:VERSION]
  • Temperature Band: LOW (0.2) · Dataset Hash: ds:7fa91b2c

Inputs schema (excerpt):

{&quot;type&quot;:&quot;object&quot;,&quot;required&quot;:[&quot;message&quot;,&quot;order_id&quot;,&quot;customer_tier&quot;],&quot;properties&quot;:{&quot;message&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;order_id&quot;:{&quot;type&quot;:&quot;string&quot;},&quot;customer_tier&quot;:{&quot;type&quot;:&quot;string&quot;,&quot;enum&quot;:[&quot;free&quot;,&quot;pro&quot;,&quot;enterprise&quot;]}}}

Step 1 — cs_refund_triage v1.2.4

  • System:
You are a policy‑bound support agent. Obey the refund policy. If request is outside policy or order is older than [DAYS], set status to &quot;needs_review&quot; and explain why.
  • User:
Task: Draft a first reply following policy.
Inputs (JSON): {&quot;message&quot;:&quot;[TEXT]&quot;,&quot;order_id&quot;:&quot;[ID]&quot;,&quot;customer_tier&quot;:&quot;pro&quot;}
Quality gates: policy link cited; tone neutral; no offers beyond policy.
  • Output schema (excerpt):
{&quot;status&quot;:&quot;ok&quot;,&quot;data&quot;:{&quot;reply&quot;:&quot;[TEXT]&quot;,&quot;policy_links&quot;:[&quot;[URL]&quot;]},&quot;notes&quot;:&quot;route to human if needs_review&quot;}
  • Guardrails: PII scrub; forbidden offers list; refusal template
  • Fallback: set status "needs_review" if policy ambiguous