Case study · BrnzyBot

A Discord bot for my raid team that refuses to make things up.

It started as a way to stop answering the same gear questions over and over. It turned into a deterministic optimizer, a healer-evaluation engine, and a Stripe billing system I rebuilt four times until I trusted it. Here's the whole thing.

Python 3.11 · discord.py · SQLite · scipy (MIP) · Warcraft Logs API · Stripe · Cloudflare Tunnel · Docker · GCP

01

The problem was a frustrating back-and-forth to a spreadsheet I had to keep up-to-date manually, answering the same questions every week.

World of Warcraft: TBC Anniversary Edition · A raid of 25 people · A lot of "what should I gear next?"

So, I play in a raid team in World of Warcraft: TBC Anniversary Edition. Twenty-five people, two nights a week, and a steady stream of the same questions: what's my next gear upgrade, what do I reserve out of this dungeon, did I play my rotation right, what's the strategy on this boss again. Best case, we're all managing a spreadsheet with our own gear, kept-up-to-date, usually by tabbing over to Warcraft Logs and a spreadsheet and a wiki and stitching the answer together by hand.

Truth be told, that doesn't scale. So I built a Discord bot to do it for me, and realized it can do it for all of us! The first version did one thing: tell a player their best gear upgrades. That one thing turned out to be surprisingly deep, and the bot kept growing from there.

The non-negotiable from day one: it had to be right. A bot that confidently tells you to equip a downgrade is worse than no bot at all. That single requirement shaped every technical decision that followed.

02

The core is a solver, not a vibe

Gear Optimization · Mixed-Integer Programming · Deterministic by design

Here's the thing about gear in this game: it's a real optimization problem. Every item has a bag of stats, every spec values those stats differently, there are set bonuses that only kick in at 2 or 4 pieces, and there's a "hit cap" cliff where one stat goes from valuable to literally worthless the moment you cross a threshold. Eyeballing it gets you close. It doesn't get you right.

So the heart of BrnzyBot is a mixed-integer program. I model each equip slot as a decision, each set bonus as a constraint, and the hit cap as a bound, then let scipy solve for the actual best loadout. Equivalence Points (a per-spec stat-weighting system) turn "this item vs that item" into a number the solver can chew on.

The deliberate choice that makes the whole project work: none of this touches an AI model. The math is math. There's an optional language-model layer [coming soon] that will narrate the results in plainer English, but it will never invent the numbers, only ever explain what the solver already decided. I'll be the first to advise that this is the boring choice. It's also the reason I can run the entire thing on a 1 GB server and trust every answer it gives.

A bot that hallucinates your gear math is a liability, not a feature. The optimizer is deterministic on purpose.

That same solver now backs three commands: a full head-to-toe gear check, a ranked "what do I upgrade next" list, and soft-reserve priority for a given dungeon (which correctly handles the hit cap and set bonuses that a naive ranking would get wrong).

26
specs supported, each with its own stat-weight model
MIP
a real solver - set bonuses and hit-cap cliffs handled, not approximated
1 GB
the whole bot runs on a single e2-micro VM
0
AI calls in any gear, strategy, or billing calculation
03

Scoring healers without lying to them

Healer Analysis · Death Attribution · Honest metrics

The feature I'm proudest of is the one I almost didn't build, because the honest answer was "this might not be possible to do fairly."

Everybody wants a healer score. The problem is that the obvious number - healing per second - is a lie. A well-played raid takes less damage, which means the healers have less to do, which makes their numbers look worse. Reward HPS and you reward healers for raids that play badly. That's backwards.

So I researched how people actually evaluate healers, and landed on a different question: not how much they healed, but what they didn't. Did someone die to damage a healer could have covered? The bot pulls the last ten seconds of damage before each death and classifies it - was this a one-shot (no heal on earth saves you from a hit bigger than your health pool), a slow bleed from standing in fire (that's a positioning problem, not a healing one), or a genuinely coverable death? Only the last kind is on the healer, and even then only if they were assigned to that target and free to act.

It does the same honest accounting for dispels. A resto shaman who didn't dispel any magic isn't slacking - shamans literally can't dispel magic in this expansion. So the bot checks class capability before it ever flags a missed cleanse. It would rather say nothing than say something unfair.

Success in healing is the raid is alive and there's mana left in the tank. That's a hard thing to score, so the bot scores coverage and survival, never throughput.

Now, I know this isn't a perfect science - healing is genuinely hard to grade, and I say so right in the output. But "here's who died and whether it was actually your fault" is a far more useful and fair conversation than "your HPS was rank 14."

04

The part that had to handle real money

Stripe Billing · Webhooks · Four security reviews

At some point a side project that people rely on needs a way to pay for itself. So I added a Patron tier - $7.95 a month per server, unlimited usage. Easy, right? Take the money, flip a flag.

It was not easy, and I'm glad it wasn't, because this is the part where being careful actually matters. Real money and real customer trust are on the line, so I treated it like an adversary was trying to get free access, and I ran the whole billing surface through repeated security reviews before a single real card touched it.

That paranoia paid for itself. The reviews caught things I would have shipped:

  • A fail-open signature check. If the webhook secret was ever misconfigured, the original code would have accepted unsigned events - anyone could forge a "this server paid" message. Now it fails closed.
  • A way to upgrade a server you don't own. The first design trusted a server ID that came back from the payment. I added a server-minted nonce so a checkout has to provably be one the bot created.
  • A refund that kept the goods. Pay, charge back, keep Patron forever. The revoke path now fails toward removing access, not keeping it.
  • A bug in my own fix. The fourth review caught that my anti-fraud nonce, done in the wrong order, could strand a paying customer if the process crashed at the wrong millisecond. Reordering three lines fixed it.

Then the live test caught two more that no amount of code review could have - because they only showed up against Stripe's actual 2026 API. Events arrived out of order, and Stripe had quietly relocated the subscription data to a new place in the payload. I found both by testing the real thing end to end, watching real webhook logs, and refusing to call it done until a real test payment flipped a real server to Patron.

Every billing decision the bot makes - granted, revoked, refused, and why - is written to a durable audit trail with the server name and buyer, so when someone says "I paid and it's not working," I can actually answer them.

4
adversarial security reviews before going near a real card
3
critical "free Patron" holes caught and closed pre-launch
34
automated tests on the billing path alone
100%
of payment handling fails closed - no grant without proof of payment
05

How it actually runs

Architecture · Deployment · Boring on purpose

The architecture is deliberately plain, because plain is what survives. Commands come in through Discord, hit a thin command layer that handles all the chat I/O, and drop into pure logic that returns text and never calls Discord back. State lives in SQLite. Gear and log data comes from the Warcraft Logs API. The optional AI layer sits behind a feature flag and talks over plain HTTP, so the bot's own image carries no heavy AI dependencies.

The billing webhook runs as its own container next to the bot, sharing one database, exposed to Stripe through a Cloudflare Tunnel rather than an open port. The repo is private; the server pulls it with a read-only deploy key. Push to the main branch and it deploys itself.

The constraint

Run the whole thing - bot, billing, optimizer - on a 1 GB e2-micro for a few dollars a month.

The discipline it forced

Every dependency earns its place. No SDK bloat, no AI in the hot path, deterministic logic that's cheap to run and easy to test.

That said, the most useful thing I did wasn't technical. When I migrated the runtime paths and renamed half the configuration, I split it into two deploys - one that changed nothing, one that flipped the switch - so there was never a moment where the code and the live data disagreed. Keep it simple, stupid. That one's from my old chemistry professor, and it's never let me down.

Python 3.11 · discord.py · SQLite · scipy · Warcraft Logs API v2 · Stripe · Cloudflare Tunnel · Docker Compose · GitHub Actions · GCP e2-micro

Most of what I build looks like this under the hood.

Deterministic where it counts, honest about its limits, and tested until I actually trust it. If you've got a messy data or automation problem, I'm happy to talk through it.