Guide · Distributed Systems · ~14 min read

Scalability Patterns in Distributed Systems

A practical, implementation-first tour of the patterns I reach for when a backend has to survive real load — sharding, load balancing, caching and back-pressure — with concrete notes from production Python and Go services.

1. What is a distributed system?

A distributed system is a set of independent processes — usually running on different machines — that cooperate over a network to look like a single service. The defining property is not "many servers"; it's that partial failure is normal. A node can be slow, a link can drop packets, a disk can fill up, and the system as a whole still has to make forward progress.

Every scalability pattern below is really a way to manage one of three finite resources: state (data that has to live somewhere), compute (CPU cycles to act on it) and bandwidth (the bytes moving between them).

2. Sharding — splitting state by key

Sharding (a.k.a. horizontal partitioning) splits one logical dataset across N physical stores using a routing key. Each shard owns a disjoint slice of the keyspace, so reads and writes for a given key always land on the same node. This is how Postgres-per-tenant, Kafka topic partitions, Redis Cluster and DynamoDB all scale writes past a single box.

Pick a shard key first, everything else second

The shard key decides your ceiling. A good key is high-cardinality, evenly distributed, and matches your most common access pattern — usually tenant_id, user_id or a hash of one. A bad key (a date, a status enum, a country) concentrates traffic on a few shards and you'll be back where you started.

Consistent hashing in Python

Naive modulo sharding (shard = hash(key) % N) breaks the moment you change N — almost every key remaps. Consistent hashing keeps most keys in place when nodes are added or removed:

python
import bisect import hashlib class HashRing: def __init__(self, nodes: list[str], vnodes: int = 128): self.ring: list[tuple[int, str]] = [] for node in nodes: for i in range(vnodes): h = self._hash(f"{node}#{i}") self.ring.append((h, node)) self.ring.sort() @staticmethod def _hash(key: str) -> int: return int(hashlib.blake2b(key.encode(), digest_size=8).hexdigest(), 16) def route(self, key: str) -> str: h = self._hash(key) i = bisect.bisect(self.ring, (h, "")) return self.ring[i % len(self.ring)][1] ring = HashRing(["db-a", "db-b", "db-c"]) ring.route("tenant:42") # -> "db-b"

Virtual nodes (the vnodes loop) flatten the load distribution — without them small rings get very lumpy.

Range vs hash sharding in Go

Hash sharding spreads writes evenly but kills range scans. Range sharding keeps neighbors together — great for time-series and "last 100 events for user X", terrible if the newest range is also the hottest. In Go services I usually wrap the choice behind a small interface so the rest of the code never sees it:

go
type Router interface { Shard(key string) string } type HashRouter struct{ ring *HashRing } func (r *HashRouter) Shard(key string) string { return r.ring.Route(key) } type RangeRouter struct{ ranges []Range } // sorted by Start func (r *RangeRouter) Shard(key string) string { i := sort.Search(len(r.ranges), func(i int) bool { return r.ranges[i].End > key }) return r.ranges[i].Node }

3. Load balancing — splitting work across replicas

Where sharding splits state, load balancing splits requests across interchangeable replicas. The algorithm you pick matters more than people expect:

P2C in Go

go
type Backend struct { URL string InFlight atomic.Int64 } func (p *Pool) Pick() *Backend { a := p.backends[rand.IntN(len(p.backends))] b := p.backends[rand.IntN(len(p.backends))] if a.InFlight.Load() <= b.InFlight.Load() { return a } return b }

At Vivox AI this exact pattern, plus per-replica circuit breakers, let a single Go Scraping Gateway hold 1–2k+ concurrent outbound requests without any one upstream tipping over.

Health checks beat clever algorithms

A balancer that routes to a dead node is worse than a dumb one. Pair any strategy with active health checks (periodic probe) and passive ones (mark a replica unhealthy after N consecutive errors, retry after a backoff). Without that, a single bad pod takes down your success rate.

4. Caching — buying time and money back

Caching is the cheapest scalability lever you have, and the easiest to get subtly wrong. Three strategies cover ~90% of real systems:

Cache-aside (lazy)

App reads from cache, falls back to DB on miss, writes back. Simple, tolerant of cache failures, and what you want by default.

python
async def get_user(user_id: str) -> User: key = f"user:{user_id}" if cached := await redis.get(key): return User.model_validate_json(cached) user = await db.fetch_user(user_id) await redis.set(key, user.model_dump_json(), ex=300) return user

Write-through and write-behind

Write-through updates cache and DB synchronously — stronger consistency, slower writes. Write-behind buffers writes in the cache and flushes asynchronously — fast, but you can lose data on crash and ordering becomes your problem.

The three things that actually bite

singleflight in Go

go
var group singleflight.Group func GetUser(ctx context.Context, id string) (*User, error) { v, err, _ := group.Do("user:"+id, func() (any, error) { if u, ok := cache.Get(id); ok { return u, nil } u, err := db.FetchUser(ctx, id) if err == nil { cache.Set(id, u, 5*time.Minute) } return u, err }) if err != nil { return nil, err } return v.(*User), nil }

5. Back-pressure, retries & timeouts

Sharding and load balancing buy capacity. Back-pressure decides what happens when you still run out. The default for most services should be: fail fast, fail loud, shed load. A request stuck waiting 60 seconds is worse than a 503 in 50ms — the slow one is holding a goroutine, a DB connection and a queue slot the next request needs.

6. A practical checklist

Before calling a backend "scalable", I run through this list. If any answer is "we'll see", that's the next thing to fix:

None of these patterns are exotic. The discipline is applying them before you need them, and being honest about which finite resource you're actually defending.