Offline‑First AI SOP: Travel‑Day Stack for Summaries, Tags, and Drafts
A copy‑paste SOP for running summaries, tags, and short drafts fully offline on travel days. Includes model/run profiles, SQLite WAL schema, watchers, a Litestream sync worker, conflict policy, and ‘Travel Day Mode’ scripts for macOS and Windows.
When you’re boarding in 12 minutes, this SOP keeps the critical loop alive: capture audio, transcribe offline, summarize/tag/draft locally on a lightweight model, and queue everything to SQLite for safe sync when you’re back online. Follow this to ship an offline‑first pattern in a weekend and run it reliably on travel days.
Architecture at a glance:
- Local runners: LM Studio, GPT4All Desktop, or Ollama (choose one) for 7–13B instruct models; Whisper.cpp or faster‑whisper for offline STT.
- Queue: SQLite with WAL. One table for work items, one for canonical docs/notes. Triggers for updated_at.
- Watchers: Watchman (or fswatch) turns file drops into queue entries.
- Sync: Litestream to S3‑compatible storage (default). LiteFS is an alternative if you need multi‑node replication later.
- Security: Full‑file DB encryption (SQLCipher or SEE). Keys from OS keychain, not .env.
Run profiles you will pick in Step 1:
- Battery Saver: 7–8B instruct, Q4 quantization, CPU/NPU only, threads ≈ physical cores − 1, context 2–4k. Targets: summaries, tags, short drafts.
- Throughput: 13B instruct with partial GPU offload (if you have it), threads ≈ physical cores, context 4–8k. Use when plugged in or on long rides.
Conservative hardware notes (plan for margin, not the floor):
- 7–8B models: ≥8 GB RAM recommended; practical Q4 footprints ~5–8 GB depending on family/quantization.
- 13B models: ≥16 GB RAM recommended; offload GPU layers if available to keep tokens/sec decent.
- 70B+ models: not for travel day. Keep this in the cloud queue.
Expected outcome: a portable, GUI‑friendly stack that works fully offline for 2–6 hours, queues everything safely, and resumes syncing/hand‑offs without manual cleanup.
- 1
Decide your run profile and set guardrails
Pick Battery Saver or Throughput based on the next 3 hours of power and work.
- Battery Saver (default): offline summaries/tags/short drafts; 7–8B Q4, CPU/NPU only; context ≤4k; threads = physical cores − 1.
- Throughput: plugged‑in or power bank; 13B with partial GPU offload; context 4–8k; threads = physical cores. Guardrails:
- If tokens/sec < 15 on summaries, fall back to Battery Saver.
- If battery < 35% with ≥60 min left, force Battery Saver until charging. Outcome: A chosen profile with clear triggers to scale down.
- 2
Install a local LLM runner with a local API
Choose one runner you’re comfortable operating:
- LM Studio (GUI + OpenAI‑compatible local API)
- GPT4All Desktop (GUI + optional local server)
- Ollama (CLI + Windows GUI; OpenAI‑compatible via adapters) Action:
- Start the runner and confirm a local API base URL. Set these in an .env file you’ll create in Step 4:
LLM_API_BASE=http://localhost:[PORT] LLM_MODEL=[your-7-8B-or-13B-instruct] LLM_CTX=4096 LLM_TEMPERATURE=0.2 LLM_TOP_P=0.9Outcome: A reachable local LLM endpoint with a selected instruct model.
- 3
Install offline STT
Pick one:
- Whisper.cpp (portable binary) for zero‑Python setups.
- faster‑whisper (Python) for GPU acceleration when available. Create an audio inbox folder (Step 4) and test:
# whisper.cpp example ./main -m ./models/ggml-medium.en.bin -f ./inbox/audio/test.m4a -otxt -of ./inbox/audio/test # faster-whisper (Python) example pip install faster-whisper python - <<'PY' from faster_whisper import WhisperModel m=WhisperModel("medium.en", compute_type="int8") segments,info=m.transcribe("./inbox/audio/test.m4a") open("./inbox/audio/test.txt","w").write("\n".join(s.text for s in segments)) print("ok") PYOutcome: You can drop audio in inbox/audio and get a local transcript .txt beside it.
- 4
Create project folders and environment
Make a simple, portable layout:
~/travel-day-stack/ .env # LLM vars + DB path db/ # SQLite lives here (encrypted) inbox/ audio/ # raw audio drops notes/ # .md or .txt drops out/ summaries/ tags/ drafts/ bin/ enqueue.sh worker.py travel_day_on.sh travel_day_off.sh TravelDayOn.ps1 TravelDayOff.ps1 sync/ litestream.yml docker-compose.ymlExample .env (don’t store secrets here in production):
DB_PATH=./db/ops.sqlite3 LLM_API_BASE=http://localhost:11434/v1 # set to your runner’s OpenAI-compatible endpoint LLM_MODEL=local-7b-instruct-q4 # replace with your actual model id/name LLM_CTX=4096 LLM_TEMPERATURE=0.2 LLM_TOP_P=0.9 SYNC_BUCKET=s3://your-bucket/sqlite AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=us-east-1Outcome: A clean workspace and environment configuration.
- 5
Create SQLite with WAL and the queue schema
Initialize the database with safe defaults and two tables: docs (canonical) and queue (work items).
sqlite3 $DB_PATH <<'SQL' PRAGMA journal_mode=WAL; PRAGMA synchronous=NORMAL; PRAGMA busy_timeout=5000; PRAGMA foreign_keys=ON; CREATE TABLE IF NOT EXISTS docs ( doc_id TEXT PRIMARY KEY, title TEXT, src_path TEXT, text TEXT, summary TEXT, tags_json TEXT, -- JSON array of strings model TEXT, hash TEXT, origin TEXT DEFAULT 'local', -- local|cloud updated_at INTEGER NOT NULL DEFAULT (strftime('%s','now')), created_at INTEGER NOT NULL DEFAULT (strftime('%s','now')) ); CREATE TABLE IF NOT EXISTS queue ( id INTEGER PRIMARY KEY, kind TEXT NOT NULL CHECK (kind IN ('stt','summarize','tag','draft')), doc_id TEXT, src_path TEXT, payload_json TEXT, priority INTEGER NOT NULL DEFAULT 5, status TEXT NOT NULL DEFAULT 'enqueued' CHECK (status IN ('enqueued','processing','done','failed')), attempts INTEGER NOT NULL DEFAULT 0, result_json TEXT, error TEXT, idempotency_key TEXT UNIQUE, updated_at INTEGER NOT NULL DEFAULT (strftime('%s','now')), created_at INTEGER NOT NULL DEFAULT (strftime('%s','now')), FOREIGN KEY(doc_id) REFERENCES docs(doc_id) ); CREATE INDEX IF NOT EXISTS idx_queue_status ON queue(status, priority, created_at); CREATE TRIGGER IF NOT EXISTS trg_docs_updated AFTER UPDATE ON docs BEGIN UPDATE docs SET updated_at=strftime('%s','now') WHERE doc_id=OLD.doc_id; END; CREATE TRIGGER IF NOT EXISTS trg_queue_updated AFTER UPDATE ON queue BEGIN UPDATE queue SET updated_at=strftime('%s','now') WHERE id=OLD.id; END; SQLOutcome: A WAL‑backed queue ready for offline work.
- 6
Add a cross‑platform enqueue helper
Turn file drops into queue items with a deterministic id and idempotency key. bin/enqueue.sh:
#!/usr/bin/env bash set -euo pipefail DB="${DB_PATH:-./db/ops.sqlite3}" PATH_IN="$1" # /full/path/to/file KIND="$2" # stt|summarize|tag|draft HASH=$(shasum -a 256 "$PATH_IN" | awk '{print $1}') DOC_ID=$(basename "$PATH_IN" | sed 's/\.[^.]*$//') TITLE="$DOC_ID" IDEMP="$KIND:$HASH" # Insert/ensure doc row sqlite3 "$DB" "INSERT OR IGNORE INTO docs(doc_id,title,src_path,hash) VALUES('$DOC_ID','$TITLE','$PATH_IN','$HASH');" # Enqueue work sqlite3 "$DB" "INSERT OR IGNORE INTO queue(kind,doc_id,src_path,payload_json,priority,idempotency_key) VALUES('$KIND','$DOC_ID','$PATH_IN','{}',5,'$IDEMP');" echo "enqueued $KIND for $DOC_ID"Windows PowerShell variant (bin/Enqueue.ps1):
param([string]$PathIn,[string]$Kind) $DB=$env:DB_PATH $hash=(Get-FileHash -Algorithm SHA256 $PathIn).Hash.ToLower() $docId=[IO.Path]::GetFileNameWithoutExtension($PathIn) $idemp="$Kind:$hash" sqlite3 $DB "INSERT OR IGNORE INTO docs(doc_id,title,src_path,hash) VALUES('$docId','$docId','$PathIn','$hash');" sqlite3 $DB "INSERT OR IGNORE INTO queue(kind,doc_id,src_path,payload_json,priority,idempotency_key) VALUES('$Kind','$docId','$PathIn','{}',5,'$idemp');" Write-Output "enqueued $Kind for $docId"Outcome: One command to add work reliably without duplicates.
- 7
Wire up file watchers for audio and notes
Use Watchman (macOS/Linux/Windows) or fswatch (macOS/Linux). Examples: Watchman config (watchman.json):
{ "inbox_audio": { "root": "./inbox/audio", "pattern": "**/*.(m4a|mp3|wav)", "command": ["bash","-lc","./bin/enqueue.sh ${WATCHMAN_MATCH} stt"] }, "inbox_notes": { "root": "./inbox/notes", "pattern": "**/*.(md|txt)", "command": ["bash","-lc","./bin/enqueue.sh ${WATCHMAN_MATCH} summarize && ./bin/enqueue.sh ${WATCHMAN_MATCH} tag"] } }fswatch example (macOS/Linux):
fswatch -0 ./inbox/audio | xargs -0 -n1 -I{} bash -lc './bin/enqueue.sh "{}" stt' fswatch -0 ./inbox/notes | xargs -0 -n1 -I{} bash -lc './bin/enqueue.sh "{}" summarize && ./bin/enqueue.sh "{}" tag'Outcome: Dropping a file automatically creates queue items.
- 8
Run the offline worker to drain the queue
A minimal worker loops: claim → do work → write results → ack. Python example (bin/worker.py):
#!/usr/bin/env python3 import json, os, sqlite3, subprocess, time, requests DB=os.getenv('DB_PATH','./db/ops.sqlite3') API=os.getenv('LLM_API_BASE') MODEL=os.getenv('LLM_MODEL') HEADERS={'Content-Type':'application/json'} PROMPT_SUMMARY=lambda text: f"Summarize in 5 bullets. Keep it factual.\n\nText:\n{text[:12000]}" PROMPT_TAGS=lambda text: f"Return 5-8 comma-separated tags capturing topics, entities, and next actions.\n\nText:\n{text[:8000]}" PROMPT_DRAFT=lambda text: f"Draft a 120-200 word follow-up email with a clear CTA based on this note.\n\nNote:\n{text[:8000]}" def chat(prompt): body={"model": MODEL, "messages": [{"role":"user","content": prompt}], "temperature": float(os.getenv('LLM_TEMPERATURE','0.2')), "top_p": float(os.getenv('LLM_TOP_P','0.9')), "max_tokens": 500, "stream": False} r=requests.post(f"{API}/chat/completions", headers=HEADERS, data=json.dumps(body), timeout=120) r.raise_for_status() return r.json()["choices"][0]["message"]["content"].strip() con=sqlite3.connect(DB) con.row_factory=sqlite3.Row con.execute('PRAGMA busy_timeout=5000') while True: cur=con.execute("SELECT id,kind,doc_id,src_path FROM queue WHERE status='enqueued' ORDER BY priority,created_at LIMIT 1") row=cur.fetchone() if not row: time.sleep(1); continue qid=row['id'] con.execute("UPDATE queue SET status='processing', attempts=attempts+1 WHERE id=?",(qid,)); con.commit() try: if row['kind']=='stt': # Prefer whisper.cpp if present out_txt=f"{row['src_path']}.txt" if os.path.exists('./main'): subprocess.run(['./main','-m','./models/ggml-medium.en.bin','-f',row['src_path'],'-otxt','-of',row['src_path']],check=True) else: # fallback: faster-whisper via python callable script or skip raise RuntimeError('No STT binary found') text=open(out_txt).read() con.execute("UPDATE docs SET text=? WHERE doc_id=?",(text,row['doc_id'])) # chain: enqueue summarize+tag for k in ('summarize','tag'): con.execute("INSERT OR IGNORE INTO queue(kind,doc_id,src_path,payload_json,priority,idempotency_key) VALUES(?,?,?,?,?,?)", (k,row['doc_id'],out_txt,'{}',5,f"{k}:{row['doc_id']}:chain")) elif row['kind'] in ('summarize','tag','draft'): text=con.execute("SELECT text FROM docs WHERE doc_id=?",(row['doc_id'],)).fetchone()[0] if not text: raise RuntimeError('No text for doc') prompt = PROMPT_SUMMARY(text) if row['kind']=='summarize' else (PROMPT_TAGS(text) if row['kind']=='tag' else PROMPT_DRAFT(text)) out=chat(prompt) if row['kind']=='summarize': con.execute("UPDATE docs SET summary=?, model=? WHERE doc_id=?",(out,MODEL,row['doc_id'])) elif row['kind']=='tag': tags=[t.strip() for t in out.replace('\n',' ').split(',') if t.strip()] con.execute("UPDATE docs SET tags_json=?, model=? WHERE doc_id=?",(json.dumps(tags),MODEL,row['doc_id'])) else: open(f"out/drafts/{row['doc_id']}.md","w").write(out) con.execute("UPDATE queue SET status='done', result_json=? WHERE id=?",(json.dumps({"ok":True}),qid)) con.commit() except Exception as e: con.execute("UPDATE queue SET status='failed', error=? WHERE id=?",(str(e),qid)); con.commit()Run it:
python3 ./bin/worker.pyOutcome: Queue drains offline; outputs land in docs + out/ folders.
- 9
Set up Litestream sync to S3‑compatible storage
Use Litestream as the default way to ship WAL changes to S3 when online. Create sync/litestream.yml:
dbs: - path: /data/ops.sqlite3 replicas: - url: ${SYNC_BUCKET} snapshot-interval: 1h retention: 72hCreate sync/docker-compose.yml:
version: "3.8" services: litestream: image: litestream/litestream:0.3 environment: - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} - AWS_REGION=${AWS_REGION} volumes: - ../db:/data - ./litestream.yml:/etc/litestream.yml:ro command: ["replicate","-config","/etc/litestream.yml"] restart: unless-stoppedStart it when you have connectivity; it will idle gracefully offline:
(cd sync && docker compose up -d)Alternative (advanced): LiteFS for multi‑node replication. Use only if you need live reads/writes across nodes; it requires FUSE privileges in Docker and a lease/primary strategy. Outcome: DB changes stream to object storage without you thinking about it.
- 10
Define your sync conflict policy
Conflicts are inevitable. Keep it boring and deterministic:
- Identity: docs.doc_id is the canonical key. Include docs.hash of the source to detect content changes.
- Writes: docs is last‑writer‑wins by updated_at IF origin differs. Local edits set origin='local'. Cloud transforms set origin='cloud'.
- Merges: tags_json merges as a set union; duplicates removed case‑insensitively.
- Idempotency: queue.idempotency_key = kind:hash (or kind:doc_id:chain for chained items) prevents duplicate work.
- Failures: queue.status='failed' items are retried up to attempts≤3 with exponential backoff; then parked for manual review.
- Manual override: a one‑liner tool can set origin and bump updated_at when you must force a version:
sqlite3 $DB_PATH "UPDATE docs SET summary=?, origin='local', updated_at=strftime('%s','now') WHERE doc_id='[DOC_ID]';"Outcome: Everyone knows what wins and how to resolve the rare tie.
- 11
Add ‘Travel Day Mode’ scripts (macOS)
Toggle low‑power settings and start only what you need. bin/travel_day_on.sh:
#!/usr/bin/env bash set -euo pipefail source ./.env pmset -a lowpowermode 1 || true export LLM_TEMPERATURE=0.1 export LLM_CTX=${LLM_CTX:-4096} # Start runner GUI/API yourself (LM Studio/GPT4All/Ollama) # Start watchers and worker (nohup fswatch -0 ./inbox/audio | xargs -0 -n1 -I{} bash -lc './bin/enqueue.sh "{}" stt' >/tmp/audio.watch.log 2>&1 &) (nohup fswatch -0 ./inbox/notes | xargs -0 -n1 -I{} bash -lc './bin/enqueue.sh "{}" summarize && ./bin/enqueue.sh "{}" tag' >/tmp/notes.watch.log 2>&1 &) (nohup python3 ./bin/worker.py >/tmp/worker.log 2>&1 &) echo "Travel Day Mode: ON"bin/travel_day_off.sh:
#!/usr/bin/env bash pkill -f worker.py || true pkill -f fswatch || true pmset -a lowpowermode 0 || true echo "Travel Day Mode: OFF"Outcome: One command pre‑flight; your laptop runs lean and keeps producing.
- 12
Add ‘Travel Day Mode’ scripts (Windows)
Switch to a power‑saving plan and start watchers/worker. bin/TravelDayOn.ps1:
$env:LLM_TEMPERATURE="0.1" # Choose a power plan GUID beforehand: powercfg /L # Example: set to Power saver if available # powercfg /S SCHEME_MAX # replace with your saver GUID Start-Process powershell -ArgumentList "-NoProfile -Command python .\bin\worker.py" -WindowStyle Minimized # Use PowerShell FileSystemWatcher for notes $fw1=New-Object IO.FileSystemWatcher (Resolve-Path .\inbox\notes), '*.*'; $fw1.EnableRaisingEvents=$true Register-ObjectEvent $fw1 Created -Action { & .\bin\Enqueue.ps1 $Event.SourceEventArgs.FullPath 'summarize'; & .\bin\Enqueue.ps1 $Event.SourceEventArgs.FullPath 'tag' } | Out-Null $fw2=New-Object IO.FileSystemWatcher (Resolve-Path .\inbox\audio), '*.*'; $fw2.EnableRaisingEvents=$true Register-ObjectEvent $fw2 Created -Action { & .\bin\Enqueue.ps1 $Event.SourceEventArgs.FullPath 'stt' } | Out-Null Write-Output "Travel Day Mode: ON"bin/TravelDayOff.ps1:
Get-Process python -ErrorAction SilentlyContinue | Where-Object { $_.Path -like '*\\bin\\worker.py*' } | Stop-Process -Force # Unregister all watcher events Get-EventSubscriber | Unregister-Event Write-Output "Travel Day Mode: OFF"Outcome: Windows laptops behave just as well offline.
- 13
Encrypt the database at rest
Prefer SQLCipher (open‑source) or SEE (commercial). Example with SQLCipher:
- Create encrypted DB and migrate:
# new encrypted db sqlcipher ./db/ops.enc <<'SQL' PRAGMA key='file:./dbkey?cipher=chacha20&kdf_iter=256000'; ATTACH DATABASE './db/ops.sqlite3' AS plaintext KEY ''; SELECT sqlcipher_export('main','plaintext'); DETACH DATABASE plaintext; SQL- Store the key reference in the OS keychain (recommended) and inject at runtime, not in .env.
- Update your scripts to use ops.enc and
sqlcipherinstead ofsqlite3. Outcome: A stolen laptop leaks nothing from your queue.
- 14
Run a 2‑hour offline drill
Practice it before you need it.
- Disconnect all networks. Start Travel Day Mode.
- Drop: one 5–10 min audio, two .md notes.
- Observe: queue growth, worker throughput (tokens/sec in runner logs), CPU temps, battery drain.
- After 2 hours, reconnect and start Litestream. Confirm:
- queue has 0 enqueued/processing; failed≤1 with clear error.
- docs rows updated; out/summaries, out/drafts populated.
- latest DB snapshot exists in S3. Outcome: Confidence the loop survives real outages.
- 15
Operational checks and rollback
Add boring checks to avoid surprises:
- On boot: verify WAL is active
PRAGMA journal_mode;→wal. - Hourly:
sqlite3 $DB_PATH "SELECT status, COUNT(*) FROM queue GROUP BY 1;"→ alert if processing stalls > 10 min. - Corruption plan: stop writes, restore from Litestream snapshot, then re‑enqueue from src files by hashing and inserting missing items only.
- Disk guard: keep 2–5 GB free; Litestream snapshots need headroom. Outcome: Clear defaults when something goes sideways.
- On boot: verify WAL is active
- 16
Escalate heavy work to cloud after reconnect
Keep travel‑day work light. When back online:
- Automatically enqueue cloud tasks for anything that exceeded thresholds offline (e.g., context > 6k, slow tokens/sec):
sqlite3 $DB_PATH "INSERT OR IGNORE INTO queue(kind,doc_id,payload_json,priority,idempotency_key) \ SELECT 'draft', doc_id, json_object('target','cloud','reason','long_context'), 7, 'cloud-draft:'||doc_id FROM docs WHERE length(text) > 6000;"- Your cloud worker (outside this SOP) reads from the same queue table replicated via Litestream and writes results back. Outcome: The laptop does the first 80%; the cloud finishes the rest when you land.
- 17
Throughput tuning (safe defaults)
Only touch these if you must, and reset after travel day.
- Threads: physical cores − 1 (Battery Saver) or = cores (Throughput).
- Context: 2–4k for summaries/tags; don’t waste RAM on 16k unless required.
- Quantization: Q4 for 7–8B; Q5 buys modest quality at higher power; avoid FP16 on battery.
- GPU offload: only when plugged in or with a large power bank; cap layers to keep temps sane. Outcome: Predictable perf without melting your battery.