Sharding to Contain the Blast Radius of Data Breaches
Posted by jboutwell 3 days ago
Comments
Comment by mhitza 1 day ago
It's been in my mind for a while, as an easy way to isolate customer data, make migrations easier and build monitoring for per customer resource usage.
Haven't seen this approach used in practice yet, and didn't get the chance to architect new SaaS products in the last few years and try it out long term.
Comment by evanelias 20 hours ago
I primarily work on schema management (migration) software for MySQL/MariaDB, and some companies leverage my product's built-in sharding support [1] for this type of per-customer sharding scheme.
When using per-customer shards, it's particularly important to avoid shard schema drift. Once your number of shards gets relatively high, you can end up with problems where an ALTER TABLE failed on just one shard, due to a network blip for example.
For this reason, at large scale with many shards, a declarative approach to schema management often works quite well: you express the desired state purely as a set of CREATE statements, and the tooling figures out the correct DDL to run to bring each shard into that desired state. This inherently solves drift because you can use a simple reconciliation loop to retry failures, i.e. running the tool repeatedly until there are no remaining changes to apply to any shard.
It's also important to support a flexible layout/mapping between shards and database servers. Initially, you can likely fit all the shards in the same server / DBMS daemon; but as the number of customers increases, you'll need to span multiple servers.
Comment by eugenekay 1 day ago
There were a few administrative drawbacks; largely because the MS-SQL Server Management Studio tools do not scale well to hundreds of active connections from a single workstation, worked-around through lots of Azure Functions runs instead. Costs and instance sizing were a constant struggle; though other engines like Postgres or even SQLite would likely be more efficient.
I have also seen this used in other formats quite successfully - Fandom/Wikia (used to?) use a MySQL database for each sub-site.
Comment by edwhitesell 1 day ago
The biggest hurdles are in the things like configurations. You'll probably want to have one code base, and maybe even one deploy/package for web servers. However, you'll need different configurations for each customer (DB name, credentials, etc.) and a way to manage them, and a way to identify which customer an HTTP request goes to before you can process it. You can use things like host names in your web app, but you'll really end up wanting some kind of "request router" to manage everything...at that point, it's far easier to put everything in one DB and move on with revenue-generation.
Comment by mhitza 1 day ago
If I were to implement it today I would probably use a centralized authorization service "authentication gateway" with something like forward_auth in Caddy to "tag along" configuration data with the request (teams, instance landing page etc. including encrypted database configuration storage, encoded as a JWT) https://caddyserver.com/docs/caddyfile/directives/forward_au...
I think the hard part is having enough discipline within a team to mostly work with backwards-compatible database changes, the automation to make that seamless, and the will to be proactive with the possibilities of this setup.
Comment by bob1029 1 day ago
Comment by shizcakes 1 day ago
You probably don’t want to do what you are proposing except in extreme, carefully evaluated cases.
Comment by nrhrjrjrjtntbt 1 day ago