Senior Director of Million-Dollar Regexes – O’Reil…



Senior director of million dollar

The following article originally appeared on Medium and is being republished here with the author’s permission.

Don’t get me wrong, I’m up all night using these tools.

But I also sense we’re heading for an expensive hangover. The other day, a colleague told me about a new proposal to route a million documents a day through a system that identifies and removes Social Security numbers.

I joked that this was going to be a “million-dollar regular expression.”

Run the math on the “naïve” implementation with full GPT-5 and it’s eye-watering: A million messages a day at ~50K characters each works out to around 12.5 billion tokens daily, or $15,000 a day at current pricing. That’s nearly $6 million a year to check for Social Security numbers. Even if you migrate to GPT-5 Nano, you still spend about $230,000 a year.

That’s a success. You “saved” $5.77 million a year…

How about running this code for a million documents a day? How much would this cost:

import re; s = re.sub(r”\b\d{3}[- ]?\d{2}[- ]?\d{4}\b”, “[REDACTED]”, s)

A plain old EC2 instance could handle this… A single EC2 instance—something like an m1.small at 30 bucks a month—could churn through the same workload with a regex and cost you a few hundred dollars a year.

Which means that in practice, companies will be calling people like me in a year saying, “We’re burning a million dollars to do something that should cost a fraction of that—can you fix it?”

From $15,000/day to $0.96/day—I do think we’re about to see a lot of companies realize that a thinking model connected to an MCP server is way more expensive than just paying someone to write a bash script. Starting now, you’ll be able to make a career out of un-LLM-ifying applications.



Source link

We will be happy to hear your thoughts

Leave a reply

Carts View
Logo
Shopping cart