Around the World in 80 Days: Building a global delivery system

We've all been in this situation... staring at a blank whiteboard, with a problem to solve, a new system to build. It is exciting; you want to do something new, something you have not done before, something to break away from the mundane. We agree, there's a certain romanticism to it. The thrill of an adventure.

And we've given into that thrill before. We still remember the horror of deploying RethinkDB to production because it felt like the future. Or the time we decided to sneak in a gRPC service as a HTTP proxy using the gRPC Gateway - because "how hard could it be?" Those decisions cost us sleep. Literally. 3:00 AM pages from the SRE on-call are not how you want to remember a Tuesday.

So when we started building Octopus Cards, we made ourselves a promise: no adventures. Boring and mundane. Simple, solid foundations. The kind of stuff that lets you sleep at night.

This is the story of how we went from zero to a fully operational digital cards platform across 100+ countries in just 80 days - and how the most boring technology stack imaginable saved our ass every step of the way.

The Philosophy: Boring is Beautiful

I'll be honest - the temptation was real. We kept catching ourselves sketching out microservice boundaries, eyeing service meshes, wondering if maybe we needed an event-driven architecture from day one. We've been burned by that instinct before. We've watched teams (including ones we were on) over-engineer themselves into corners they couldn't escape.

So we fought the urge. Hard.

Our stack is deliberately boring: Go, PostgreSQL, and Valkey (the open-source Redis fork). No ORMs. No auto-migrations. No magic. Every line of SQL is written with intent. Every migration is reversible. Every dependency is explicit.

And honestly? It's the best technical decision we've ever made. The system is simple, a small team can maintain it, and it processes orders in under a second - reliably, every time. I sleep through the night now. Mostly.

Go: The Language That Gets Out of Your Way

We went with Go because we needed something we could trust not to surprise us at 2 AM. It's simple, it compiles to a single binary, and the compiler catches mistakes before they become incidents. The concurrency model doesn't hurt either.

Our entire application runs from a single main.go entry point:

go run main.go server   # HTTP server
go run main.go worker   # Background job processors
go run main.go cron     # Scheduled tasks
go run main.go run-all  # Everything together

One binary. One deployment artifact. We cannot tell you how many times this simplicity has saved us. When something goes wrong at midnight, we're not debugging which of twelve services is misbehaving. We're looking at one process, one set of logs, one thing to restart.

The CLI is built with Kong, which gives us a clean command structure without framework overhead. Server, workers, crons, migrations, seeders - all subcommands of the same binary, sharing the same initialization pipeline. It's the kind of boring that makes you grateful during an incident.

PostgreSQL: The Database That Does Everything

We use PostgreSQL for everything - and we mean everything:

Transactional data: Orders, inventory, clients, vendors
Critical Logs: API responses, requests, audits
Job tracking: Execution history, cron metrics, import/export status
Vector embeddings: Product matching via pgvector (yes, PostgreSQL does AI too)

We run primary + read replica for write/read splitting. Not at the query level - at the connection level. Write queries go to the primary. Read queries go to the replica. The repository layer decides which connection to use, keeping it dead simple.

We know this sounds like we're putting all our eggs in one basket. And we are. But it's a really, really good basket. Postgres has been around for decades, has incredible tooling, and handles every workload we've thrown at it. Every time we've been tempted to add another database to the mix, Postgres already had a feature for it.

Why Squirrel Over an ORM

Early on, we considered GORM. It would've been faster to get started. But we'd been burned by ORM magic before - the kind where everything works beautifully until you hit scale, and then you're staring at a query plan that makes no sense because the ORM decided to do six JOINs behind your back.

So we went with Squirrel, a SQL query builder for Go. Here's what a real query looks like:

query, args, err := squirrel.
    Select("id", "product_id", "denomination", "status").
    From("inventories").
    Where(squirrel.Eq{"product_id": productID}).
    Where("deleted_at IS NULL").
    Where("status = 'active'").
    Limit(50).
    ToSql()

It's just SQL - with type-safe parameter binding and no string concatenation. You can read it. You can debug it. You can copy it into psql and run it directly. And when we're at our desks at 11 PM trying to figure out why a query is slow, we can see exactly what's hitting the database. No N+1 surprises. No lazy loading foot-guns. No SELECT * hidden behind a method call.

This decision has saved us more debugging hours than we can count.

Migrations: Explicit, Reversible, Predictable

Every schema change is a Goose migration file with explicit Up and Down sections:

-- +goose Up
CREATE TABLE inventories (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT NOT NULL REFERENCES products(id),
    denomination DECIMAL(12,2) NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    deleted_at TIMESTAMP,
    created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_inventories_product ON inventories(product_id)
    WHERE deleted_at IS NULL;

-- +goose Down
DROP TABLE IF EXISTS inventories;

Every index is intentional. Every column type is chosen. Every migration is reviewed before it touches the database.

We've had to roll back a migration at 1 AM exactly once. It took thirty seconds and worked perfectly. That one moment justified every minute we'd spent writing Down sections we thought we'd never use.

Valkey: The Cache That Thinks in Data Structures

We use Valkey (the open-source Redis fork) not just as a key-value cache - we use it as a data structure server. Sorted sets, hash maps, atomic operations - Valkey gives us data structures with sub-millisecond access times, and we lean on them heavily for the hot paths in our system.

The result: our p99 allocation latency sits under 5ms. We remember the day we first hit that number. We sat there refreshing the Grafana dashboard, watching the latency line flatline, thinking "there's no way this actually works." But it did. And it kept working. Boring technology, used correctly, is a beautiful thing.

Observability: The Thing That Saved Us the Most

We'd love to say we built observability in from day one because we were disciplined. The truth is closer to: we'd been on teams where we didn't, and the experience was traumatic enough that we refused to repeat it.

OpenTelemetry Everywhere

Every significant operation creates a trace span:

ctx, span := utils.Tracer.Start(ctx, "inventory.allocate")
span.SetAttributes(
    attribute.Int64("product_id", productID),
    attribute.String("client_id", clientID),
)
defer span.End()

Trace context flows through the entire stack: HTTP request → service layer → database query → cache lookup → queue publish → background worker. One trace ID ties everything together.

When something goes wrong - and things always go wrong - we can pull up a single trace and see exactly what happened, in what order, and how long each step took. It's like having a flight recorder for every request. We've diagnosed issues in minutes that would have taken hours without this.

Structured Logging with Zap

Every log line is structured JSON with request context:

logger := middleware.Logger(ctx)
logger.Infow("Order created",
    "order_id", order.ID,
    "product_id", order.ProductID,
    "duration_ms", elapsed.Milliseconds(),
)

No fmt.Printf. No unstructured strings. Every log is queryable, filterable, and correlated with its trace. We used to think structured logging was overkill for small teams. Then we tried to grep through 200MB of unstructured logs at 3 AM during an incident. Never again.

The Cron System: Simple but Serious

Our scheduler runs on gocron, with a lifecycle that goes beyond "run this every N hours":

type ScheduledTask interface {
    Identifier() string
    TimePattern() string
    Execute(ctx context.Context) error
    PreExecute(ctx context.Context)
    PostExecute(ctx context.Context)
    HandleFailure(ctx context.Context, err error)
}

Each task has pre/post hooks, failure handlers, and execution tracking. Jobs run in singleton mode - if a previous execution is still running, the next one is rescheduled instead of overlapping. We learned this the hard way at a previous gig, where two instances of the same cron overlapped and double-charged a batch of customers. That's not a mistake you make twice.

Schedules are defined in code but overrideable from the database. An admin can disable a job or change its frequency without a deployment. Feature flags control which jobs are even registered:

if utils.FeatureFlags.IsVouchersEnabled() {
    tasks = append(tasks,
        scheduler.NewInventoryPumpTask(),
        scheduler.NewPendingOrderRetryTask(),
    )
}

No vouchers feature? No voucher cron jobs consuming resources.

Graceful Everything

Every process - server, worker, scheduler - handles SIGTERM gracefully:

Stop accepting new work
Wait for in-flight operations to complete
Flush logs and metrics
Exit cleanly

No orphaned transactions. No half-processed messages. No lost data. The system can be restarted at any time, on any instance, without coordination. Unlike the Death Star, our shutdown sequence actually works - and we don't need an exhaust port to trigger it.

How We Avoided Building a Death Star

We have to be honest about this section. It's not that we were wise enough to avoid these things from the start. It's that we've seen - and sometimes built - the Death Star. We know what it feels like to be six months into a microservice migration, realising you've just recreated a monolith but worse, with network calls where function calls used to be.

The Empire's fatal flaw wasn't a lack of firepower - it was over-engineering. One exhaust port. One single point of failure. One proton torpedo and the whole thing is space dust. We've seen that pattern in software more times than we'd like to admit.

Here's what we chose not to build, and why we're grateful for every one of these decisions:

No microservices. One binary serves everything. We were tempted, believe us. But with a small team, a monolith is a superpower. One thing to deploy, one thing to monitor, one thing to reason about. No Death Star - no exhaust port.
No Kubernetes. A single Go binary behind nginx doesn't need container orchestration. We don't need to manage a fleet when we're running a starfighter. K8s is great - for problems we don't have.
No GraphQL or gRPC. REST with clear route groups and pagination headers has been plenty. Every time we think "maybe we should add GraphQL," we remember the last time we debugged a deeply nested resolver and pour ourselves a coffee instead.
No NoSQL. PostgreSQL handles relational data, full-text search, and vector embeddings. We were this close to adding MongoDB for "flexible schemas" early on. Thank God we didn't. Postgres does it all, and unlike the One Ring, this one actually works in your favour.
No auto-migration. Every schema change is intentional and reviewed. The rebels won because they studied the blueprints. We review ours before they ship - because we've seen what happens when you don't.

The Result

500+ brands. 100+ countries. Sub-second end-to-end latency. 99.9% uptime. Built in 80 days. A codebase that any Go developer can read, understand, and contribute to on day one.

All built on PostgreSQL, Valkey, and Go. No magic. No framework lock-in. No thermal exhaust ports. Just battle-tested, boring technology - doing exactly what it was designed to do.

The Empire built a moon-sized superweapon and lost it twice. We built a monolith with sorted sets and honestly? We just sleep better now. That's the real win.

This is part one of our engineering series. Next up: how we scaled our wallets system to handle over a thousand requests a second.

Want to see what all this engineering powers? Read How Digital Gift Cards Work for the user-facing side of the platform, or check out our launch announcement for the full story.

License

This article is licensed under CC BY-NC-SA 4.0. You are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.