The Wallet That Couldn't Count: Scaling Transactions the Hard Way
How we went from race conditions to sub-millisecond wallet updates - a journey through mistakes, over-engineering, and finally letting the database do its job.
Our wallet system is fast, correct, and runs on a single PostgreSQL function that executes in under a millisecond. We're pretty proud of it.
But it didn't start that way. There's a special kind of dread that comes with realising your wallet system can't count. Not "off by a penny" can't count - "a client just made three purchases simultaneously and two of them went through on the same balance" can't count. You're running a cards platform. Money moves through your system. Real money. Client money.
We operate under the same constraints as a bank. If a client has £10,000 and spends £3,000, that balance had better be exactly £7,000. Not eventually. Not usually. Always. This is how we got there - and like the rest of our stack, the answer turned out to be the most boring one imaginable.
The Read-Update-Write
The simplest thing that could possibly work:
func (r *WalletRepo) Debit(ctx context.Context, walletID int64, amount decimal.Decimal) error {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer func() { _ = tx.Rollback() }()
var balance decimal.Decimal
err = tx.QueryRowContext(ctx,
"SELECT balance FROM wallets WHERE id = $1", walletID,
).Scan(&balance)
if err != nil {
return err
}
newBalance := balance.Sub(amount)
if newBalance.IsNegative() {
return errors.InsufficientFunds
}
_, err = tx.ExecContext(ctx,
"UPDATE wallets SET balance = $1 WHERE id = $2",
newBalance, walletID,
)
if err != nil {
return err
}
return tx.Commit()
}Shipped it. Worked beautifully in testing. Then traffic happened.
Two requests arrive at the same time. Both start a transaction and read the balance as £10,000. Both subtract £3,000. Both write £7,000. The client spent £6,000 but was only charged £3,000. We'd invented a money printer.
What worked: Dead simple. Fast. Good enough at low concurrency.
What didn't: Fundamentally broken under concurrent load. No atomicity between the read and write - that gap is where money disappears. No audit trail either.
We needed a better solution. FAST!
The Append-Only Ledger
So we over-engineered it. Instead of a mutable balance, we'd keep an append-only ledger - every transaction gets a new row, and the balance is computed by reading every entry. We were emulating addAndGet of atomic operations at the application layer, with a full scan on every operation.
func (r *WalletRepo) Debit(ctx context.Context, walletID int64, amount decimal.Decimal) error {
// Application-level lock - no two goroutines touch the same wallet
unlock := r.lockManager.Lock(walletID)
defer unlock()
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer func() { _ = tx.Rollback() }()
// Read ALL ledger entries for this wallet
rows, err := tx.QueryContext(ctx,
"SELECT amount, type FROM wallet_ledger WHERE wallet_id = $1",
walletID,
)
if err != nil {
return err
}
defer func() { _ = rows.Close() }()
// Manually compute balance by looping through every entry
balance := decimal.Zero
for rows.Next() {
var entryAmount decimal.Decimal
var entryType string
if err := rows.Scan(&entryAmount, &entryType); err != nil {
return err
}
if entryType == "credit" {
balance = balance.Add(entryAmount)
} else {
balance = balance.Sub(entryAmount)
}
}
if balance.Sub(amount).IsNegative() {
return errors.InsufficientFunds
}
_, err = tx.ExecContext(ctx,
`INSERT INTO wallet_ledger (wallet_id, amount, type)
VALUES ($1, $2, 'debit')`,
walletID, amount,
)
if err != nil {
return err
}
return tx.Commit()
}Look at that code. We're acquiring an application-level mutex, pulling every ledger row into Go memory, iterating one by one, doing arithmetic that SQL can do natively, and holding a lock the entire time.
What worked: Race conditions gone. Complete audit trail. Auditors loved it.
What didn't: Application-level locks held connections open while Go did arithmetic. Every operation scanned the entire ledger - O(n) and growing. p99 climbed to 200-500ms. We were emulating atomicity at the wrong layer, reinventing what PostgreSQL already does natively.
The ledger wasn't wrong. Our implementation was just in the wrong place and orders of magnitude slower.
The Stored Procedure
We'll be honest - we resisted this. Stored procedures have a reputation problem. They feel like something your dad's enterprise team wrote in 2004 at a company where deployments involved a change advisory board and a three-week lead time. Thousand-line PL/SQL monsters buried in Oracle databases that nobody dares touch because the one person who understood them retired in 2017.
But biases are always worth questioning. When we looked at what we actually needed - read one row, do arithmetic, insert one row, update one row - it was embarrassingly obvious that this is what stored procedures were made for. Sometimes a tool survives for decades because it's genuinely good at what it does.
CREATE OR REPLACE FUNCTION wallet_debit(
p_wallet_id BIGINT, p_amount NUMERIC, p_reference TEXT
) RETURNS NUMERIC AS $$
DECLARE
v_balance NUMERIC;
BEGIN
-- Row-level lock on just this wallet
SELECT balance INTO v_balance
FROM wallets WHERE id = p_wallet_id
FOR UPDATE;
-- validate, compute new balance ...
-- Still keep the full audit trail
INSERT INTO wallet_ledger (wallet_id, amount, type, reference)
VALUES (p_wallet_id, p_amount, 'debit', p_reference);
-- update balance, return ...
END;
$$ LANGUAGE plpgsql;The magic is FOR UPDATE - a row-level lock on just the wallet being modified. The procedure reads the balance, validates it, appends a ledger entry, updates the wallet, and returns the new balance. All inside one transaction, entirely within the database process. No round trips.
The Go side collapsed to almost nothing:
func (r *WalletRepo) Debit(ctx context.Context, walletID int64, amount decimal.Decimal, ref string) (decimal.Decimal, error) {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return decimal.Zero, err
}
defer func() { _ = tx.Rollback() }()
var newBalance decimal.Decimal
err = tx.QueryRowContext(ctx,
"SELECT wallet_debit($1, $2, $3)",
walletID, amount, ref,
).Scan(&newBalance)
if err != nil {
return decimal.Zero, err
}
return newBalance, tx.Commit()
}The key insight was FOR UPDATE - a row-level lock on just the wallet being modified. Not a table lock. Not an application mutex. Wallet A never blocks wallet B. The entire operation - lock, validate, ledger insert, balance update - runs in under a millisecond, entirely inside the database.
What worked: ~1ms execution. True atomicity. Row-level locking. Full ledger audit trail. The Go code is trivial.
What didn't: Honestly? Nothing yet. We keep waiting for the catch.
The Numbers
| Approach | Correctness | p99 Latency | Concurrency |
|---|---|---|---|
| Read-Update-Write | Race conditions | ~5ms | Broken under load |
| Append-Only Ledger | Correct | ~200-500ms | Sequential per wallet |
| DB Procedure | Correct | ~1ms | Parallel across wallets |
Boring Is Still Beautiful
We wrote a whole post about choosing boring technology. This wallet journey might be the best example of that philosophy in action.
Too simple - ignored the problem. Too clever - solved it in the wrong place. The answer was a plpgsql function that would have been unremarkable in 2005. Nothing about it would get upvotes on Hacker News. But it runs in under a millisecond, it's been in production for months without a single incident, and when a new engineer asks "how do wallet transactions work?" we point them at one SQL function and one Go method.
We didn't need a distributed lock service, event sourcing, CQRS, or a saga orchestrator. We needed PostgreSQL to do what it's been doing since before most of our team started programming. We handle money - not page views, not recommendations. When a client loads £100,000 into their wallet, that number has to be sacred. A stored procedure running in ~1ms gives us not just speed, but confidence. The confidence to pass an audit without crossing our fingers.
Sometimes the best engineering is getting out of the way and letting proven, boring technology do its thing.
This is part of our engineering series. Read about how we built the platform with boring technology, or learn how digital gift cards work from the user's perspective.
License
This article is licensed under CC BY-NC-SA 4.0. You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
Buy what this post is about
Ready to get a card?
Browse Octopus Cards on Driffle — gaming top-ups, mobile recharges, and travel eSIMs at a discount, delivered instantly.
Related posts
Breaking Our Own System on Purpose: Mocks, Chaos, and Fuzz
We integrate with dozens of external vendors. Any one of them can have a bad day. Here's how we make sure their bad day never becomes ours.
Your Data Is Yours. We Mean That.
You shouldn't have to trust a random platform with your Gamer ID. Here's exactly what we collect, how we protect it, and why we built it this way.
Around the World in 80 Days: Building a global delivery system
We built a global digital cards platform in 80 days. The secret? Absolutely no exciting technology whatsoever.

