What LLMs really cost inside an outbound system

LLMs are used across almost every outbound system today, but most teams still do not understand the actual cost behind each task. People run enrichment, rewriting, and personalization at scale without knowing which parts consume almost no credits and which parts quietly multiply cost. When you break the workflow down step by step, the pattern becomes clear.

For this week’s breakdown, I rebuilt a simple outbound flow and calculated the estimated cost of each task across 1,000 leads. This is a basic starter setup with short prompts, small outputs, and the same tasks most GTM teams already run. No complex prompts or chains. Just practical work.

The tasks fall into two groups. The operational layer uses GPT-4o mini. These calls are small, predictable, and low on reasoning. The writing layer uses Claude Sonnet. These calls require more context, more structured output, and more reasoning, which increases token usage.

Here is what each task represents and why the cost shifts the way it does.

Lead enrichment
This pulls simple attributes from raw inputs. The prompt is small and the output is usually a single value. Cost stays low because the model does not create much text and does not need deep reasoning.

Data cleanup and normalization
This is even cheaper. The model only standardizes what is already provided. The input is short, the output is short, and the reasoning is basic.

Personalization variables
These cost slightly more because the model has to read a snippet, interpret context, and generate a small hook. The output is still short, but the extra reasoning adds a bit more token use.

Short prospect summary
This summary pulls multiple pieces of information into one line. The input grows because more fields are passed in, and the output grows as the model rewrites the data into a readable form. This pushes the cost a little higher than the earlier tasks.

Classification and tagging
This is one of the cheapest tasks in the flow. The model labels something as fit or not fit. The output is only a few tokens. Very little reasoning is required.

Quick grammar and tone rewrites
These involve reading a short sentence and rewriting it. The input contains the full text and the output mirrors it, so cost increases slightly due to the length of the rewritten text.

A/B variant generation
This stays low because the variants are short. The output adds a few extra lines, but the reasoning is shallow.

The writing layer shifts the cost curve.

Cold email writing
The model processes several inputs and produces a full email. Both the input and the output are larger, so the cost rises.

Persona or industry rewrite
This requires the model to adjust phrasing while holding structure. The input and output are both sizable here.

Tone shifting
Reasoning increases because the model must understand the original message and produce a different delivery style.

LinkedIn DM
Shorter messages keep this cheaper than email writing.

Social content snippet
Output length increases again, which bumps the cost.

Case study rewrite
The model reads more context and produces a longer structured paragraph. This requires more reasoning and more tokens.

Long-form narrative messaging
This is the largest output in the flow. Long outputs drive the highest cost.

Once you see the cost pattern mapped across 1,000 leads, planning becomes more intentional. You know which tasks stay cheap and which tasks require more budget.

See you next Wednesday!

If you are still unsure how to price this for your own motion, feel free to reach out. I am always happy to jump on a call and walk through your setup and strategy :)

David Turewicz

Welcome to my scheduling page. Please follow the instructions to add an event to my calendar.

app.iclosed.io/e/GM-Discovery/discoverycall

What LLMs really cost inside an outbound system

Keep Reading

Hey there 👋