05 · Tokenmaxxing · Claude Code activity tracker

Generate your Claude Code Wrapped.

Paste your stats-cache.json, discover your token usage, model taste, peak coding days, and get a shareable AI coding activity card.

Paste your stats-cache.json
Run /stats in Claude Code first to refresh, then copy from ~/.claude/stats-cache.json
This file contains aggregate usage stats — not your prompts, code, files, or conversations. Parsing happens in your browser. Your stats are only stored on the server when you click Save my score in the report.
How to get a fresh stats-cache.json
  1. In a Claude Code session, run:
    /stats
    This forces Claude Code to recompute the cache from your full session history. Without this step, the file may be stale by days (it doesn't auto-refresh).
  2. Copy the file contents (in any terminal):
    cat ~/.claude/stats-cache.json | pbcopy
    (On Linux, swap pbcopy for xclip -selection clipboard.)
  3. Paste above and hit Generate report.

We never ask for your project JSONLs or the rest of ~/.claude — only this single aggregate file.

The tokenmaxxing hierarchy

Tokens / day Tier Translation
0+ Token Newcomer Just dipping in. Low volume, mostly Sonnet, no caching pattern yet.
100,000+ Daily Driver Coding with AI most days. Token usage rising, model selection still default.
500,000+ Power Engineer Full-time AI workflow. Cache hits visible, multi-model routing emerging.
2,000,000+ Token Whale You are not prompting. You are moving codebases. Cache ratio matters.
5,000,000+ Industrial Tokenmaxxer Power-user tier. Subagents, long context, Opus only when needed.

Tokenmaxxing 101

The basics, decoded.

What tokenmaxxing means

Tokenmaxxing is the deliberate optimization of LLM token efficiency for engineers using AI coding tools — Claude Code, Cursor, Aider, Continue, GitHub Copilot, and similar. The substance is reducing waste through caching, model selection, prompt structuring, and parallel execution. The number-go-up dynamic of token dashboards is the trojan horse: once you are tracking the metric, you optimize it.

Claude Code's pricing model in plain English

Four token categories, all priced per million tokens: input tokens (everything you send: system prompt, conversation history, file context), output tokens (the model's response — usually the smallest line item by volume), cache write (input tokens stored for reuse, priced at 1.25× normal input), cache read (input tokens served from cache, priced at 0.1× normal input). Heavy users derive most of their cost savings from the gap between cache writes (paid once) and cache reads (recurring at 10%).

Cache hit ratio and why it matters

Cache hit ratio = cache_read / (cache_read + cache_creation). Above 50% is good; above 70% is excellent; above 90% means most of your input is essentially free. Low cache hits typically come from dynamic content — timestamps, random IDs, or frequently-changing data — placed before the cache breakpoint, forcing every turn to invalidate and rebuild. The fix: stable content first, dynamic content last.

The model picker: Opus vs Sonnet vs Haiku

The Claude family is structured as three tiers per generation. Opus 4.X: highest reasoning capability, slowest, most expensive — use for novel architecture, hard debugging, ambiguous spec interpretation, complex refactors. Sonnet 4.6: the workhorse — fast, cheap, competitive on most well-scoped tasks; should handle 70%+ of routine coding work. Haiku 4.5: cheapest, fastest — ideal for simple file edits, formatting, classification, short-response tasks. Most users overuse Opus by 2–3×.

Long context for multi-file work

Modern Claude models offer 200k–1M token context windows. Loading 30 relevant files into a long-context session once and operating against them is dramatically more token-efficient than 5 short turns of "let me see file X, now file Y" round-trips. Pay the input tokens once, then cache them, then iterate — the cache reads carry the cost over the session.

Subagents for parallel work

Searching a codebase across 20 unrelated files in serial is wasteful — both in time and in token round-trips. Subagents (the Task / Agent tool in Claude Code) let you spawn parallel readers that each return concise summaries to the main agent. Net effect: 5× wall-clock reduction and a smaller main-agent context (because the subagents absorb the file contents and only return summaries).

Token economics for engineers

For most professional engineers, AI tooling pays for itself at 5–10% productivity uplift. Once that threshold is cleared, the marginal cost of an extra million tokens per week is roughly 30 minutes of engineering time — negligible. The optimization frame should not be "spend less" but "spend with intent." Heavy use on the right tasks beats light use on everything. Tokenmaxxing is about the latter, not penny-pinching.

What does NOT work

Manually trimming whitespace from prompts. Using Haiku for hard reasoning to "save money" (you'll re-do the work). Caching dynamic content with timestamps in it (cache breaks every turn). Massive system prompts for simple tasks. Defaulting to Opus on every turn. Pattern: anything that mistakes input volume for engineering quality.

How to actually tokenmaxx

  1. 01
    Cache aggressively.

    Cache reads cost 10% of normal input tokens. Place stable system prompts, large reference files, and project context in the cache breakpoint. The cache_read / cache_creation ratio is the single most important number on your dashboard.

  2. 02
    Pick the model per task.

    Opus for hard reasoning, Sonnet for normal coding work, Haiku for simple file edits and short responses. Defaulting Opus to everything is the most common token-leak we see in real usage.

  3. 03
    Use long context for multi-file work.

    Modern Claude models offer 200k–1M context. Loading the right files once into a long-context window beats 5 short turns of round-trip clarification. Pay tokens once, save thinking tokens many times over.

  4. 04
    Subagents for parallel search.

    Searching a codebase across 20 files in serial is wasteful. Spawn subagents (Task tool / general-purpose agents) to parallelize independent reads — the latency saving compounds with the token saving.

  5. 05
    Don't over-prompt simple tasks.

    A 2,000-token system prompt for a single-line edit is pure waste. Calibrate prompt size to task size; trust the model on routine work.

  6. 06
    Watch the cache_creation_input_tokens ratio.

    High cache_creation relative to cache_read means you are paying full price for context every turn. Restructure prompts so the stable parts come first; anything before the cache breakpoint is reusable.

FAQ

What is tokenmaxxing? +

Tokenmaxxing is the deliberate optimization of LLM token usage for engineers using AI coding tools — Claude Code, Cursor, Aider, Continue, etc. The substance is reducing waste through caching, model selection, prompt structuring, and parallel execution. The tool above generates a Claude Code Wrapped from your stats-cache.json so you can see your usage patterns at a glance.

How do I generate my Claude Code Wrapped? +

In Claude Code, your usage data lives in ~/.claude/stats-cache.json. Open the file, copy the entire JSON, paste it into the tool above. Everything runs in your browser — no upload, no server, no account. The output is a shareable card with your token totals, model split, peak coding days, and cache efficiency.

How is Claude Code priced? +

Token-based: input tokens (context you send), output tokens (model's response), cache writes (stored at full input rate), and cache reads (10% of input rate). Heavy users see most of their savings come from the gap between cache writes and cache reads — the system prompt, large files, and project context get cached once and read many times.

What is a good cache hit ratio? +

Above 50% on a typical day is good; above 70% is excellent. The ratio is cache_read_input_tokens / (cache_read + cache_creation_input_tokens). Low cache hits mean you are recomputing context every turn. Common cause: dynamic content (timestamps, random data) at the top of the prompt forcing cache invalidation.

When should I use Opus vs Sonnet vs Haiku? +

Opus 4.X for hard reasoning, novel architecture, complex debugging, ambiguous spec interpretation. Sonnet 4.6 for the bulk of routine coding work — it is fast, cheap, and competitive with Opus on most well-scoped tasks. Haiku 4.5 for cheap fast edits, simple file operations, formatting tasks. Most users overuse Opus by 2–3×.

Is tokenmaxxing only for Claude? +

No, the principles transfer. OpenAI, Gemini, xAI, and most major providers expose similar primitives: prompt caching, multi-tier model families, long-context windows. The concrete numbers and tool integrations differ, but the optimization patterns are the same: cache stable context, route by task complexity, parallelize independent calls, calibrate prompt size.

Where does the term come from? +

A self-aware extension of the maxxing suffix to engineering culture, circa 2025. Originated in AI-engineering Twitter and Cursor / Claude Code communities as a half-ironic counterpart to looksmaxxing. The number-go-up dynamic of token dashboards proved irresistible to the same brain that fills 12-step skincare regimens.

Do you store my stats? +

No. The tool runs 100% client-side in your browser. Your stats-cache.json never leaves the page. If you save a final score to your account, only the aggregate metric is stored — not the underlying data.