Episode 2·May 7, 2026

Build Your Own LLM Cost Meter Before the Next Provider Change

Spotify Apple Podcasts RSS Feed Open Companion Resource

Intro

This episode is for nomad founders already shipping AI workflows who are flying blind on costs across multiple providers. You'll get a concrete schema, three deployment paths, and spend cap strategies to prevent billing disasters and satisfy audit requirements.

In This Episode

Santi opens with the story of a developer who woke up to $82,314 in stolen API charges, then walks through the 19-field schema that prevents these disasters. Kira adds the human review trail needed for EU AI Act compliance. They cover three implementation paths: Google Sheets with Apps Script for quick wins, Langfuse with DuckDB for data ownership, and PostHog with Metronome for hosted alerting. The episode includes spend caps, degraded-mode routing when budgets trip, and a complete starter template with webhook code, SQL queries, and Slack alerts.

Key Takeaways

A 19-field vendor-agnostic schema captures every LLM call with tokens, costs, and audit trails in one portable format across all providers
Three deployment paths let you start with Google Sheets in under an hour, then migrate to open-source or SaaS as you scale
Daily spend caps with degraded-mode routing prevent billing disasters while keeping your business running when budgets trip

Timestamps

Companion Resource

template

Indie LLM Cost Meter Starter (CSV + GSheets + SQL)

A vendor-agnostic, copy‑paste starter to log LLM usage/costs, see a weekly digest, and trigger budget alerts. Includes CSV/Sheet schema, Apps Script webhook, Looker Studio layout, DuckDB/Parquet SQL, and a Slack alert recipe.

OpenAI API Pricing
openai.com
- - OpenAI API pricing (as of May 2026) lists GPT‑5.5 at $5.00 per 1M input tokens and $30.00 per 1M output tokens; cached input is $0.50 per 1M.
OpenAI API Rate Limits (Developers)
developers.openai.com
- - OpenAI documents rate limits for the API, including requests per minute and tokens per minute guidance, with enforcement via 429s and retry‑after headers.
OpenAI Responses API Reference
developers.openai.com
- - OpenAI’s Responses API returns a usage object with input and output token counts, enabling per‑request cost estimation.
Anthropic Claude API Rate Limits
platform.claude.com
- - Anthropic documents both monthly spend limits and rate limits by tier, enforced with a token‑bucket algorithm; 429s include retry‑after.
Gemini API Rate Limits
ai.google.dev
- - Google’s Gemini API publishes official rate‑limit/quota documentation and ties higher quotas to Cloud Billing enablement.
Langfuse Token & Cost Tracking docs
langfuse.com
- - Langfuse supports ingesting token usage and/or cost from provider responses as the most accurate way to track LLM usage costs.
DuckDB Parquet docs (overview)
duckdb.org
- - DuckDB provides efficient read/write support for Apache Parquet and can push down filters/projections when scanning Parquet files.
Apps Script Web Apps + Quotas
developers.google.com
- - Google Apps Script Web Apps can receive POST requests and write rows to Google Sheets; quotas/limits apply to execution time and write operations.
Looker Studio: Connect to Google Sheets (Cloud docs)
cloud.google.com
- - Looker Studio has a native Google Sheets connector to build dashboards from a Sheet data source.
Slack Incoming Webhooks
docs.slack.dev
- - Slack Incoming Webhooks enable simple, authenticated POSTs to send alert messages into channels.
European Commission AI Act page
digital-strategy.ec.europa.eu
- - The EU AI Act entered into force on August 1, 2024 and becomes generally applicable on August 2, 2026, with phased obligations.
Helicone: How We Calculate Cost
docs.helicone.ai
- - Helicone documents cost calculation by mapping model identifiers to provider pricing tables, including OpenAI streaming specifics.
Tom's Hardware coverage of a developer’s stolen Gemini API key and $82k charges in 48 hours
tomshardware.com
- - Gemini API key compromise leading to catastrophic spend
- - Concrete proof that daily caps and rolling-window alerts matter; motivates building budget guardrails and degraded-mode fallbacks.
Langfuse docs: Token & Cost Tracking
langfuse.com
- - Open‑source logger with built‑in token and cost tracking
- - Shows an OSS path that natively ingests usage/cost from provider responses and can back data to a warehouse or files (e.g., Parquet).
Google Apps Script Web Apps + Looker Studio Sheets connector (official docs)
developers.google.com
- - Spreadsheet‑first logging via webhook to Google Sheets, then visualized in Looker Studio
- - Demonstrates a lowest‑friction, no‑infra implementation path for indie teams to centralize events/costs and build spend digests.
PostHog API + Metronome docs
posthog.com
- - Lightweight SaaS implementation: event capture + usage meters for budget enforcement
- - Illustrates a hosted path: capture events in PostHog, forward usage counts to a billing meter (e.g., Metronome) to power spend caps and alerts.

Santi: Eighty-two thousand dollars. In forty-eight hours.

Kira: Wait — on one key?

Santi: One stolen Gemini API key. A developer — this was reported by Tom's Hardware — wakes up, checks billing, eighty-two thousand three hundred fourteen dollars in charges he didn't make. Two days. And he only found out because he happened to check the dashboard. There was no alert. No cap. No circuit breaker. Just a billing page with a number that could bankrupt him.

Kira: And this is someone who was paying attention. Imagine the people who check billing once a month.

Santi: Or once a quarter. Or never — because they're on a bus in Colombia and the dashboard is on a laptop they haven't opened since Tuesday.

Kira: That's half our audience.

Santi: That's me six months ago. I was running two products across OpenAI and Anthropic, and my total LLM cost monitoring was — I'm embarrassed to say this — checking each provider's billing page whenever I remembered. Which was maybe every ten days.

Kira: And you had no idea what each customer was actually costing you.

Santi: None. Zero per-customer attribution. Zero per-job attribution. I knew my total OpenAI bill and my total Anthropic bill. That's it. I couldn't tell you which workflow was burning money, which client was profitable, or whether my margins were five percent or fifty percent on any given job.

Kira: The spreadsheet guy had no spreadsheet.

Santi: The spreadsheet guy had no spreadsheet. And the thing that finally broke me wasn't a stolen key — it was a pricing change. Anthropic adjusted their token pricing, and I didn't notice for three weeks. By the time I caught it, my margins on one product had dropped from forty percent to eleven percent. And I only found out because a client asked me to itemize costs for an audit.

Kira: That audit request — that's the moment most of us are heading toward and don't know it yet. A client asks where their money went, a provider changes pricing, or the EU AI Act kicks in on August second and suddenly you need evidence logs you never kept. And you're scrambling.

Santi: So today we're building the thing that prevents all three of those emergencies. One portable schema, three ways to deploy it — from a Google Sheet to open-source tooling to lightweight SaaS — plus spend caps and a degraded-mode fallback for when your budget trips. You'll have the whole system mapped out and a starter template to ship it this week.

Kira: So before we get into the build — why can't you just use the dashboards the providers already give you? Anthropic has spend limits. OpenAI shows you usage. Google ties quotas to your billing. Why build your own thing on top of that?

Santi: Because those dashboards only show you that provider. And the second you're calling two providers — which, if you listened to our failover episode, you should be — you've got two separate views of spend that don't talk to each other. You can't see total cost per customer. You can't see total cost per job. You definitely can't see which workflow is eating your margin across both providers combined.

Kira: And this is the important part — you can't set a single budget cap that covers everything. Anthropic will let you set a monthly spend limit on Anthropic. OpenAI will show you OpenAI usage. But nobody is watching the total.

Santi: Nobody's watching the total. And when you add Gemini or a local model into the mix, it gets worse. Three dashboards, three billing cycles, three different ways of counting tokens — and no unified view of what your business is actually spending on AI.

Kira: Okay, so the fix is a single layer that sits between your code and all of those providers. Every API call writes one row to one table, same format, regardless of which provider handled it.

Santi: Exactly. And the schema for that row is simpler than people think. Nineteen fields. I'll walk through the ones that matter most.

Kira: Go.

Santi: Every row gets an event ID — just a UUID — a timestamp in UTC, and an actor type. That actor type is either user, agent, or human reviewer. Then your provider, model, and region — all lowercase, normalized. Always "openai" not "OpenAI," always "gpt-5.4-mini" not whatever marketing name they're using this week.

Kira: Why does the normalization matter that much?

Santi: Because the moment you run a SQL query grouping by provider and you've got "OpenAI" and "openai" and "open_ai" as three separate entries, your numbers are wrong and you don't know it. Normalize once at ingestion. Never think about it again.

Kira: Fair. Keep going.

Santi: Then the cost fields — input tokens, output tokens, latency in milliseconds, and cost in US dollars. OpenAI's Responses API hands you the token counts directly in the usage object. Anthropic does the same. You multiply by the published per-token price and you've got cost per call. Right now, GPT-5.5 is five dollars per million input tokens, thirty dollars per million output. So a call with twelve hundred input tokens and three hundred output tokens costs you about a fifth of a cent.

Kira: Which sounds like nothing until you're making two hundred of those calls a day.

Santi: Two hundred calls a day at that rate is maybe forty, fifty cents. But swap in a bigger model, or hit a workflow that generates long outputs, and suddenly you're at eight, ten dollars a day. Per customer. And if you don't know that, you can't price correctly.

Kira: Okay, so that's the cost side. But there are fields in this schema that aren't about cost at all — and those are the ones I actually care about more.

Santi: The audit fields.

Kira: The audit fields. Job ID, customer ID — those are your attribution keys, so you can slice spend by client or by workflow. But then there's PII flag, review required, reviewer ID, and decision. And those exist because of two things. One — if you have clients who care about data handling, you need to prove which calls touched personal data and which didn't. And two — the EU AI Act.

Santi: And I'll be honest, I almost didn't include those fields when I first built this. I thought — I'm a solo founder, I'm not a high-risk AI system, why do I need a reviewer trail?

Kira: And then?

Santi: And then I read Article twenty-nine. Deployers — that's us, people calling APIs — have to keep logs for at least six months. And Article fifty says if someone interacts with your AI system, you have to tell them. The Act becomes generally applicable August second, twenty twenty-six. That's three months from now.

Kira: Three months. And the fines are up to seven percent of global revenue. Not profit — revenue.

Santi: So the reviewer fields aren't overhead. They're insurance. When a call gets flagged for human review, you log who reviewed it, what they decided — approved, rejected, edited — and now you've got an evidence trail that maps directly to what the regulation asks for.

Kira: And even if you never touch EU clients — which, if you're a nomad, you probably will eventually — having that trail makes you look professional when a client asks how you handle AI governance. I've won two contracts this year partly because I could show a review log.

Santi: Alright, so you've got the schema. Nineteen fields, one row per API call. Now — how do you actually capture this? Three paths, and they're not mutually exclusive. You can start with one and migrate later because the schema is the same everywhere.

Kira: Path one is the one I'd recommend for anyone who's just starting — spreadsheet first.

Santi: You're recommending the spreadsheet path?

Kira: I am. Because I've watched too many people in my community try to set up Langfuse on day one, get stuck on Docker configuration, and abandon the whole project. A Google Sheet with an Apps Script webhook gets you logging in under an hour. You deploy the script as a web app, your server POSTs a JSON payload after every LLM call, and the script appends a row. Done.

Santi: And the quotas on Apps Script are fine for most indie operations. You'll hit limits at a few requests per second, but if you're at that volume, you've already outgrown Sheets anyway.

Kira: Right. Then you connect Looker Studio to that Sheet — native connector, five minutes — and you've got a dashboard. Daily cost, seven-day cost, thirty-day cost. Top customers by spend. Schedule a weekly email export and you've got your spend digest.

Santi: No infrastructure. No servers. No Docker. Just a Sheet and a dashboard.

Kira: That's the Lisbon Test right there. Can you set this up from a café with sketchy wifi? Yes. In under an hour? Yes.

Santi: Path two is where I live — open-source logger plus DuckDB and Parquet. Langfuse is the tool. It's built for LLM observability, natively ingests token usage and cost from provider responses — similar to how Helicone does it — and gives you per-generation cost tracking out of the box.

Kira: Okay but what happens when you outgrow the hosted tier or want to own your data completely?

Santi: Parquet. You export your events to Parquet files — partitioned by provider and date — and query them with DuckDB. DuckDB reads Parquet natively, runs on your laptop. I run my daily cost queries from a cron job on my MacBook. Total daily spend by provider, per-customer seven-day burn, rolling averages — all in SQL, all against flat files I can back up anywhere.

Kira: And retention?

Santi: Thirty days hot in your primary store. Hundred eighty days warm in Parquet. And never store raw prompts or completions in the meter itself — hash them if you need text for review. The details are in the starter template.

Kira: Path three — and I want to be upfront that we couldn't verify every detail of the retention docs here — is lightweight SaaS. If you're already running PostHog for product analytics, you pipe your LLM events into the same capture endpoint and wire a billing meter like Metronome for usage-based alerts.

Santi: The advantage is you get alerting without building it. The disadvantage is you're adding another vendor dependency to a system that's supposed to reduce vendor dependency.

Kira: Which is ironic. But for teams already in that ecosystem, it's the fastest path to enforcement.

Santi: The key thing across all three paths is that the schema is identical. Same nineteen fields. Same naming conventions. So if you start on Sheets and outgrow it, you export to Parquet and keep going. The schema is the portable layer. The tools are interchangeable.

Santi: Which brings us to the part that would have saved that developer eighty-two thousand dollars. Spend caps and degraded-mode routing.

Kira: Walk me through how this actually works in practice.

Santi: Two numbers. A daily cap in dollars and a rolling seven-day cap. You set those based on your margins — for my content repurposing tool, the daily cap is fifty dollars and the seven-day cap is two hundred. Every time a request comes in, the routing layer checks current spend against those caps. If you're under, the request goes to the default model. If you're over—

Kira: You don't just stop.

Santi: You don't just stop. That's the mistake people make — they think a spend cap means a hard cutoff. But if a client's deliverable is due in two hours and your cap trips, you can't just return an error. So you degrade. Three options. If the request isn't urgent, queue it — hold it for two hours until the daily counter resets. If it can use a cheaper model, route it to a smaller model automatically. And if it absolutely needs a full-capability response, flag it for human review and let someone decide whether to override the cap.

Kira: And you log that decision in the same schema. Review required equals true, reviewer ID, decision — approved, rejected, edited. So your audit trail captures not just the normal flow but the exceptions.

Santi: Exactly. And the alerting is dead simple. A Slack incoming webhook. When your daily spend crosses the cap, or your seven-day burn crosses the rolling threshold, you get a message in Slack with the numbers. I get mine on my phone. I've caught two anomalies in the last month just from those pings — one was a client workflow that started looping, and one was a model upgrade I'd forgotten I'd deployed that was three times more expensive per call.

Kira: Three times. You forgot you deployed it?

Santi: Three times. And I caught it within four hours because the daily alert fired. Without that alert, I'd have found out on my next billing review — which, as we established, was every ten days.

Kira: Four hours versus ten days. That's the difference between a bad afternoon and a bad quarter. So the alert paid for itself immediately.

Santi: On hour four.

Kira: So let's bring this back to where we started. Santi checking provider dashboards every ten days. That developer staring at an eighty-two thousand dollar bill he didn't create. Both of those situations have the same root cause — no unified layer watching the spend. And the fix isn't some enterprise observability platform. It's nineteen fields, one row per call, and a weekly digest that lands in your inbox every Monday.

Santi: And the thing I keep coming back to is how little time this actually takes to set up. The spreadsheet path — an hour. The Langfuse path — maybe an afternoon if you're comfortable with Docker. The SaaS path — depends on what you're already running. But none of these are weekend projects. They're Tuesday afternoon projects. And once they're running, they just... run. You get a Slack ping when something's wrong, you get a digest when it's time to review, and the rest of the time you're building your actual business instead of manually checking three billing dashboards from a café in Oaxaca.

Kira: We put together the Indie LLM Cost Meter Starter — it's on the Resources page. The CSV header, the Google Sheet setup, the Apps Script webhook, the DuckDB queries, and the Slack alert recipe. Everything we talked about today in a format you can copy and deploy.

Santi: One thing to do this week. Just one. Add the schema to your next LLM call. Log one row. Event ID, timestamp, provider, model, tokens, cost. Just one row. Once you see that first row land in a Sheet or a Parquet file, you'll never go back to checking billing dashboards manually. That's the whole shift.

Kira: See you Wednesday.

Santi: See you Wednesday.

LLM cost monitoringAI cost trackingvendor-agnostic loggingspend capsEU AI Act complianceaudit trailsDuckDBParquetGoogle Sheets automationApps Script webhooksLangfusePostHogMetronomedegraded mode routingAPI budget alertsnomad business operationslocation-independent AImulti-provider workflows