Litestream VFS
Posted by emschwartz 19 hours ago
Comments
Comment by psanford 17 hours ago
Comment by benbjohnson 17 hours ago
Comment by fragmede 17 hours ago
Comment by bencornia 17 hours ago
> Ever wanted to do a quick query against a prod dataset, but didn’t want to shell into a prod server and fumble with the sqlite3 terminal command like a hacker in an 80s movie? Or needed to do a quick sanity check against yesterday’s data, but without doing a full database restore? Litestream VFS makes that easy. I’m so psyched about how it turned out.
Man this is cool. I love the unix ethos of Litestream's design. SQLite works as normal and Litestream operates transparently on that process.
Comment by simonw 17 hours ago
export LITESTREAM_REPLICA_URL="s3://my-bucket/my.db"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
sqlite3
.load litestream.so
.open file:///my.db?vfs=litestream
PRAGMA litestream_time = '5 minutes ago';
select * from sandwich_ratings limit 3;Comment by zackify 16 hours ago
brew install sqlite3, then change the bottom part:
/opt/homebrew/opt/sqlite/bin/sqlite3
.load litestream sqlite3_litestreamvfs_init
.open file:///my.db?vfs=litestream
you have to manually pass in the init function nameComment by zackify 15 hours ago
import { Database } from "bun:sqlite";
Database.setCustomSQLite("/opt/homebrew/opt/sqlite/lib/libsqlite3.dylib");
// Load extension first with a temp db
const temp = new Database(":memory:");
temp.loadExtension("/path/to/litestream.dylib", "sqlite3_litestreamvfs_init");
// Now open with litestream VFS
const db = new Database("file:my.db?vfs=litestream");
const fruits = db.query("SELECT * FROM fruits;").all();
console.log(fruits);Comment by koeng 11 hours ago
Comment by seigel 13 hours ago
Comment by ricardobeat 12 hours ago
brew list sqlite
gives you the installed path, works for any formula.Comment by seigel 11 hours ago
Comment by ricardobeat 11 hours ago
Comment by seigel 11 hours ago
Thanks for humouring me! :D
Comment by zackify 11 hours ago
Comment by seigel 10 hours ago
* "Just need to have "LITESTREAM_REPLICA_URL" and the key id and secret env vars set when running the script"
... and that attempting to load the variables using `dotenv` will not work!!
Comment by indigodaddy 18 hours ago
Currently on this app, I have the Python/flask app just refreshing the sqlite db from a Google spreadsheet as the auth source (via dataframe then convert to sqlite) for the sqlite db on a daily scheduled basis done within the app.
For reference this is the current app: (yes the app is kinda shite but I’m just a sysadmin trying to learn Python!) https://github.com/jgbrwn/my-upc/blob/main/app.py
Comment by benbjohnson 17 hours ago
You don't need any additional code (Python or otherwise) to use the VFS. It will work on the SQLite CLI as is.
Comment by indigodaddy 15 hours ago
Comment by Eikon 16 hours ago
Comment by indigodaddy 15 hours ago
Comment by Eikon 15 hours ago
Comment by indigodaddy 15 hours ago
Comment by ncruces 15 hours ago
Slightly different API (programmatic, no env variables, works with as many databases as you may want), but otherwise, everything should work.
Note that PRAGMA litestream_time is per connection, so some care is necessary when using a connection pool.
Comment by dkam 6 hours ago
DuckDB has a lakehouse extension called "DuckLake" which generates "snapshots" for every transaction and lets you "time travel" through your database. Feels kind of analogous to LiteStream VFS PITR - but it's fascinating to see the nomenclature used for similar features. The OLTP world calls it Point In Time Recovery, while in the OLAP/data lake world, they call it Time Travel and it feels like a first-class feature.
In SQLite Litestream VFS, you use `PRAGMA litestream_time = ‘5 minutes ago’` ( or a timestamp ) - and in DuckLake, you use `SELECT * FROM tbl AT (VERSION => 3);` ( or a time stamp ).
DuckDB (unlike SQLite) doesn't allow other processes to read while one process is writing to the same file - all processes get locked out during writes. DuckLake solves this by using an external catalog database (PostgreSQL, MySQL, or SQLite) to coordinate concurrent access across multiple processes, while storing the actual data as Parquet files. It's a clever architecture for "multiplayer DuckDB.” - deliciously dependent on an OLTP to manage their distributed multiple user OLAP. Delta Lake uses uploaded JSON files to manage the metadata skipping the OLTP.
Another interesting comparison is the Parquet files used in the OLAP world - they’re immutable, column oriented and contain summaries of the content in the footers. LTX seems analogous - they’re immutable, stored on shared storage s3, allowing multiple database readers. No doubt they’re row oriented, being from the OLTP world.
Parquet files (in DuckLake) can be "merged" together - with DuckLake tracking this in its PostgreSQL/SQLite catalog - and in SQLite Litestream, the LTX files get “compacted” by the Litestream daemon, and read by the LitestreamVFS client. They both use range requests on s3 to retrieve the headers so they can efficiently download only the needed pages.
Both worlds are converging on immutable files hosted on shared storage + metadata + compaction for handling versioned data.
I'd love to see more cross-pollination between these projects!
Comment by chickensong 17 hours ago
Comment by skybrian 18 hours ago
Comment by benbjohnson 18 hours ago
Comment by zackify 17 hours ago
Edit:
need to set LITESTREAM_ACCESS_KEY_ID, LITESTREAM_SECRET_ACCESS_KEY, LITESTREAM_REPLICA_URL
then the module works
Comment by wim 17 hours ago
Comment by darintay 16 hours ago
Comment by andersmurphy 16 hours ago
I guess there's only one way to find out.
Comment by JaggerJo 16 hours ago
Comment by ncruces 15 hours ago
Comment by JSR_FDED 13 hours ago
Litestream continues to work as always, making continuous backups to S3.
Like always, I can restore from those backups to my local system.
But now I have the option of doing “virtual restores” where I can query a database backup directly on S3.
Comment by pkhuong 13 hours ago
Comment by tptacek 13 hours ago
Litestream does not require a VFS to work. It still does all the cool stuff it did before; in fact, it does those things much better now, even without the VFS.
Comment by dangoodmanUT 16 hours ago
SQLite VFS is really cool tech, and pretty easy to work with (IMO easier than FUSE).
I had made a _somewhat similar_ VFS [1] (with a totally different set of guarantees), and it felt pretty magical how it "just worked" with normal SQLite
Comment by orliesaurus 18 hours ago
ALSO I'm thinking about mixing this with object store caching... maybe combining memfs with remote metadata; would love to see more details on performance.
BUT I might be overthinking it... just excited to see SQLite exploring beyond local files...
Comment by reactordev 17 hours ago
Comment by benbjohnson 17 hours ago
Comment by reactordev 15 hours ago
Comment by hobo_mark 16 hours ago
Comment by benbjohnson 16 hours ago
Comment by joshstrange 15 hours ago
Comment by benbjohnson 14 hours ago
Comment by honktime 14 hours ago
Comment by tptacek 14 hours ago
Comment by davnicwil 14 hours ago
I think what we're getting here is a way to just spin up a local shell / app and run arbitrary queries from any point in time over the network without having to sync the full prod database. I guess with LiteFS you would have to do this, or pre-plan to do this, it's not totally on-demand.
Or said another way, do things locally as though in prod without having to ssh to prod and do it there (if you even can, I guess if 'prod' is just s3 you can't really do this anyway so it's an entirely new capability).
@benbjohnson is this right? I humbly suggest adding a tl;dr of the main takeaway up top of the post to clarify. Love your work on litestream, thanks for what you do!
Comment by tptacek 13 hours ago
One reason you're not getting such a clear usage statement at the top of this post is, it's an early feature for a general-purpose capability. I think we might rather get other people's takes about what it's most useful for? There are some obvious use cases, like the one you just identified.
Comment by davnicwil 12 hours ago
Comment by hintoftime 18 hours ago
Comment by benbjohnson 18 hours ago
Comment by dpedu 18 hours ago
Comment by born-jre 18 hours ago
I am going to integrate Litestream into the thing I am going to building[1]. I experimented with a lot of ways, but it turns out there is WebDAV support recently merged, not in the docs.
Comment by dzonga 17 hours ago
Comment by benbjohnson 17 hours ago
Comment by nickpeterson 14 hours ago
Comment by tptacek 13 hours ago
Comment by jauntywundrkind 18 hours ago
Different use case, but makes me think of sqlite Rewrite-it-it-Rust Turso announcing AgentFS. Here the roles are flipped, sqlite is acting as a file store to back FUSE, to allow watching/transaction-managing the filesystem/what agents are doing. Turso also has a sick CDC system built in, that just writes all changes to a cdc table. Which is related to this whole meta question, of what is happening to my sqlite DB. https://turso.tech/blog/agentfs
Comment by tptacek 18 hours ago
Comment by jauntywundrkind 18 hours ago
To just drop the relevant paragraph that addresses my un-clarity/in-correctness (and which is super fun to read):
> Litestream v0.5 integrates LTX, our SQLite data-shipping file format. Where earlier Litestream blindly shipped whole raw SQLite pages to and from object storage, LTX ships ordered sets of pages. We built LTX for LiteFS, which uses a FUSE filesystem to do transaction-aware replication for unmodified applications, but we’ve spent this year figuring out ways to use LTX in Litestream, without all that FUSE drama.
Comment by tptacek 18 hours ago
The easiest way so far to understand the split between Litestream and LiteFS: Litestream is an operational tool, for backup and restore. LiteFS is a method for doing online leader/follower replica clusters.
Comment by rekwah 17 hours ago
Comment by sundbry 16 hours ago
Comment by itissid 17 hours ago
If you are not familiar with data systems, havea read DDIA(Designing Data Intensive Applications) Chapter 3. Especially the part on building a database from the ground up — It almost starts with sthing like "Whats the simplest key value store?": `echo`(O(1) write to end of file, super fast) and `grep`(O(n) read, slow) — and then build up all the way to LSMTrees and BTrees. It will all make a lot more sense why this preserves so many of those ideas.
Comment by petcat 18 hours ago
Comment by ashish01 18 hours ago
Comment by mhitza 18 hours ago
Comment by andersmurphy 16 hours ago
Comment by 0xbadcafebee 18 hours ago
I think the devil's in the details though. I expect a high number of unusual bugs due to the novel code, networking, and multiple abstractions. I'd need to trial it for a year before I called it reliable.
Comment by jtbayly 18 hours ago
Comment by christophilus 18 hours ago
Comment by tptacek 18 hours ago
Comment by victorbjorklund 14 hours ago
Comment by andersmurphy 16 hours ago
Comment by jauntywundrkind 18 hours ago
Comment by andersmurphy 16 hours ago
Comment by 9rx 18 hours ago
A better question to ask is why the world needs yet another DBMS, but the reasons are no doubt valid.