How We Track AI Costs Per Student (and Why You Should Too)
Excerpt: Uncontrolled AI costs can sink a product. We built per-student token tracking with daily limits, monthly caps, and real-time alerts. Here's how.
Let me tell you about the morning I woke up to a $476 charge from Anthropic.
It was not a breach. It was not a runaway loop. It was something much dumber: I had my API key set as an environment variable in my shell profile instead of using the Claude Max plan for development. Every time I ran a test, every time I iterated on a prompt, every time I asked the tutor a question during debugging -- that was production API calls at production prices. For weeks.
That $476 bought me the most expensive lesson in my career about AI cost management. And it is the reason our platform now tracks every single token, for every single student, with alerts that go off long before anyone gets a surprise bill.
If you are building anything that calls an LLM API, you need this. Not eventually. Now. Before your first user.
Why AI Costs Are Different
Traditional SaaS costs are predictable. A PostgreSQL query costs roughly the same whether the user asks for 10 rows or 10,000. A REST endpoint that serves JSON costs the same whether the JSON is a name or a novel. You can model your infrastructure costs as a function of user count and provision accordingly.
AI costs are not like that. They are a function of conversation. A student who asks short, focused questions might cost $0.01 per interaction. A student who pastes in 5,000 lines of code and asks for a detailed review might cost $0.30 for a single exchange. Multiply that by hundreds of students, each with different usage patterns, and you have a cost surface that is essentially unpredictable from user count alone.
This is the fundamental problem: AI costs scale with usage intensity, not just user count. And if you do not track them at the individual level, you will not understand your economics until it is too late.
Our Architecture: Three Layers of Protection
We built cost tracking with three concentric rings of protection. Each one catches what the others miss.
Layer 1: Per-Interaction Token Counting
Every API call to our tutor records the exact token count -- input and output separately -- and calculates the cost immediately.
Here is the core of it, from our actual codebase:
from decimal import Decimal
Claude Sonnet pricing
COST_PER_INPUT_TOKEN = Decimal("0.000003") # $3 per 1M tokens
COST_PER_OUTPUT_TOKEN = Decimal("0.000015") # $15 per 1M tokens
def calculate_cost(input_tokens: int, output_tokens: int) -> Decimal:
"""Calculate cost for a single API call."""
input_cost = Decimal(input_tokens) * COST_PER_INPUT_TOKEN
output_cost = Decimal(output_tokens) * COST_PER_OUTPUT_TOKEN
total_cost = input_cost + output_cost
return total_cost.quantize(Decimal("0.0001")) # 4 decimal places
Two things worth noting. First, we use `Decimal` instead of `float`. This is not pedantry. When you are summing thousands of tiny fractions, floating-point rounding errors accumulate into real money. We learned this the hard way when our daily totals started disagreeing with our monthly totals by a few cents -- which might sound trivial until you are trying to reconcile billing for 500 students.
Second, we track input and output tokens separately. Output tokens cost 5x more than input tokens with Claude Sonnet. A system prompt that is 2,000 tokens of input costs $0.006. A detailed response that is 2,000 tokens of output costs $0.03. If you only track total tokens, you cannot optimize. When we realized our system prompt was inflating input costs, we were able to trim it and save roughly 15% on API spend without changing the tutor's behavior at all.
Layer 2: Daily and Monthly Limits
Token counting tells you what happened. Limits prevent what should not happen.
Every student has two configurable limits:
DEFAULT_DAILY_LIMIT_USD = Decimal("10.00")
DEFAULT_MONTHLY_LIMIT_USD = Decimal("100.00")
When a student approaches these limits, the system fires alerts at three thresholds: 80%, 90%, and 100%. The alerts are deduplicated -- you get one warning at 80%, one at 90%, and one when you hit the cap. Not a flood of notifications every time you ask a question.
class AlertType(str, Enum):
DAILY_80 = "daily_80"
DAILY_90 = "daily_90"
DAILY_100 = "daily_100"
MONTHLY_80 = "monthly_80"
MONTHLY_90 = "monthly_90"
MONTHLY_100 = "monthly_100"
When the daily limit is hit, the system raises a `CostLimitExceeded` exception. The tutor does not just silently stop working. It tells the student what happened, shows them their usage, and explains when the limit resets. Because a confused user who thinks the system is broken is worse than an informed user who knows they hit their cap.
The key design decision here: limits are checked after recording the cost, not before. This means the last interaction of the day might push a student slightly over the limit, and that is fine. The alternative -- pre-checking and rejecting a request before processing it -- creates a worse user experience (the student typed a thoughtful question and got nothing) and does not meaningfully save money (one extra interaction is a fraction of a cent).
Layer 3: Real-Time Dashboard
Students can see their costs. This is deliberate.
Our daily stats endpoint returns everything a student needs to understand their usage:
{
"date": "2026-03-21",
"cost_usd": 1.47,
"interaction_count": 83,
"daily_limit_usd": 10.00,
"percentage_used": 14.7,
"off_topic_cost_usd": 0.12,
"off_topic_free_remaining_usd": 0.88
}
Transparency changes behavior. When students can see that a 5,000-word paste costs 10x more than a focused question, they start writing better questions. Not because we told them to, but because the feedback loop is visible and immediate. This is one of those cases where good cost management is also good pedagogy -- the skill of writing clear, concise prompts is exactly what we want them to learn.
The Three-Tier Topic Boundary System
Cost tracking alone is not enough. You also need to manage what students ask about, because that directly impacts costs and product positioning.
We built a three-tier system:
Tier 1: In-Scope Questions. These are questions about the enrolled course material. Ask as many as you want, within your daily and monthly limits. This is what you are paying for.
Tier 2: Adjacent Topics. These are questions that are related to AI engineering but covered by a different course. The tutor gives a brief, helpful answer -- enough to unblock the student -- and then suggests the relevant course. The cost is tracked but included in the student's normal limits.
Tier 3: Off-Topic Questions. These are questions that fall outside our course catalog entirely. "Explain quantum computing" when you are enrolled in an agent engineering course. For these, the tutor estimates the token cost before answering and asks for the student's consent.
The off-topic handling is where it gets interesting. We give every student a $1/week free allowance for off-topic questions. Generous enough that a curious student never feels penalized for wandering. Bounded enough that we are not subsidizing a general-purpose AI assistant.
From our cost router
off_topic_cost = _get_weekly_off_topic_cost(student_id)
free_allowance = 1.0
stats["off_topic_cost_usd"] = round(off_topic_cost, 4)
stats["off_topic_free_remaining_usd"] = round(max(0, free_allowance - off_topic_cost), 4)
Why $1/week? Because at our current per-interaction costs, that is roughly 30-50 off-topic questions. Enough to satisfy genuine curiosity. Not enough to replace ChatGPT. We want students to explore, but we also need the economics to work.
Every off-topic request is logged as a `topic_request` event with the tier classification and subject. This is not just cost management -- it is market research. When we see 200 students asking about "deploying agents to Kubernetes" in the same month, that tells us what course to build next.
The Real Numbers
Let me share what actual usage looks like, because I think the numbers will surprise you.
At current Claude Sonnet pricing ($3/1M input tokens, $15/1M output tokens), here is what interactions cost:
| Interaction Type | Input Tokens | Output Tokens | Cost |
|---|---|---|---|
| Short question, concise answer | ~500 | ~300 | $0.006 |
| Detailed question, thorough answer | ~1,500 | ~1,000 | $0.02 |
| Code review (pasted code + analysis) | ~4,000 | ~2,000 | $0.04 |
| Long conversation (5-turn thread) | ~8,000 | ~5,000 | $0.10 |
A typical student doing focused coursework uses about 30-50 interactions per day. That is $0.60 to $1.00 per day. Over a four-week course, that is roughly $15-25 per student in API costs.
That is remarkably affordable. At those economics, the AI tutor is cheaper than a single hour of human tutoring, and it is available 24/7.
But here is the thing: those are average numbers. The distribution has a long tail. We have had students who ran 300 interactions in a single day while debugging a complex project. Without daily limits, that one student could have cost us $15-20 in a single day. Multiply that by a handful of power users and you are looking at real money.
The limits are not there to punish heavy users. They are there to make the business sustainable so we can keep the per-student cost low for everyone.
Implementation: Where the Tokens Come From
If you are building this yourself, here is how to get token counts from the Anthropic API. The usage data comes back in every response:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain pipeline architecture"}]
)
Token usage is in the response
input_tokens = response.usage.input_tokens
output_tokens = response.usage.output_tokens
Calculate and store cost
cost = calculate_cost(input_tokens, output_tokens)
Record it
tracker.record_api_call(
input_tokens=input_tokens,
output_tokens=output_tokens,
interaction_id="some-uuid"
)
The `usage` object is always present in the response. You do not need to estimate tokens or use a separate tokenizer. The API tells you exactly what was consumed.
Store these records with the student ID, a timestamp, and the interaction ID. You want to be able to answer three questions at any time:
1. How much has this student cost today?
2. How much has this student cost this month?
3. What was the most expensive interaction and why?
That third question is the one that drives optimization. When you find a student whose average interaction costs $0.05 when the platform average is $0.02, you look at what is different. Maybe their conversation history is getting too long and inflating input tokens. Maybe the system prompt needs trimming. Maybe they are pasting entire files when they only need to paste the relevant function.
Caching: The Multiplier Nobody Talks About
One more piece that is easy to overlook: response caching.
We cache cost statistics with short TTLs -- 120 seconds for daily and monthly stats, 300 seconds for limits. This means if a student checks their cost dashboard five times in a minute (and they do, especially when they are close to a limit), we are not hitting the database five times. We are hitting it once and serving the rest from cache.
This sounds minor until you have 500 concurrent students and your stats endpoints are getting hammered. At that point, caching is the difference between a $50/month database and a $500/month one.
Lessons Learned
After building this system and running it with real students, here is what I wish I had known from the start:
Track from day one. Do not wait until costs are a problem. Instrument your very first API call. The data you collect early -- before you have optimized anything -- is the most valuable data you will ever have, because it tells you where the biggest opportunities are.
Use Decimal, not float. I mentioned this already but it bears repeating. Financial calculations with floating-point arithmetic will make you question your sanity when the numbers do not add up.
Set limits lower than you think. You can always raise them. You cannot un-spend money. Our $10/day default sounds generous, and it is -- most students use less than $1. But we have caught runaway loops that would have hit $50 before anyone noticed.
Make costs visible to users. This is counterintuitive. You might think showing costs will scare users away. In practice, it builds trust. Students appreciate knowing they are not going to get an unexpected bill. And as I mentioned, it makes them better prompt engineers.
Log everything, alert early. The 80% threshold is the important one. By the time you are at 100%, it is too late to change behavior. The 80% alert gives students time to adjust.
And finally: check your shell profile for API keys. You would be amazed how many production tokens are sitting in `.zshrc` files, quietly running up bills every time a developer opens a terminal. That $476 lesson was the best investment in operational discipline I have ever made.
This is Part 6 of our series on AI agent engineering. The cost tracking system described here is open source as part of our platform. If you want to see the full implementation -- including the database schema, the caching layer, and the admin dashboard for monitoring platform-wide costs -- check out the repository.