πFS
Posted by helterskelter 7 days ago
Comments
Comment by jamwise 6 days ago
The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.
The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.
Comment by ithkuil 6 days ago
Reinventing Entropy Compression is Intelligence Part 1
3blue1brown https://youtu.be/l6DKRf-fAAM?is=ne73FCJ7ErXhzZ-v
Comment by nz 6 days ago
Comment by dothraki 6 days ago
Comment by idiotsecant 6 days ago
Comment by lompad 6 days ago
Comment by cestith 6 days ago
Comment by sam_lowry_ 6 days ago
Comment by ainch 6 days ago
Comment by everforward 6 days ago
Some of that is also the domain. It's less that science is an extreme form of compression, and more that natural phenomenon are highly compressible. They're a small number of kinds of interactions repeated a bajillion times. How many equations does it take to explain electricity (ignoring equations that are derivatives of ones already included)? I think it's less than 5.
On some level, you could probably reduce all of the Standard Model down to models of atoms, their motion, and the basic subatomic particles (the non-quantum ones). That would explain almost everything that happens on Earth in a very short form, though few people would be able to go from that to explaining how lightning works.
Comment by ainch 6 days ago
It's also a relevant example for AI - one paper tested the ability of Transformers to model planetary orbits: unlike Newton's Law, the implicit forces they learn are nonsense.
Comment by mdp2021 6 days ago
(Then it depends on your concern: "Aagh, the aunt fell!" // "Oh yes, that'd be Newton")
Comment by esquivalience 6 days ago
This is totally lost on me.
Comment by user_7832 6 days ago
Appears to be lossy then ;)
(Sorry, you have to admit that was too easy to not say)
Comment by mdp2021 6 days ago
Laws (scientific, philosophical etc.) as compression represent the common side of classes of events - an abstraction of said events, stripping the irrelevant - irrelevant to some perspective, or irrelevant in a potential Procuste's bed. So, laws are compression, but a so extremely lossful compression that the loss can be relevant.
Brutally, "there may be more to the story of the fall of an elderly than just gravitation" - also in the sense that there are details behind the event.
Laws are compression - yes, with caveats.
On a more scientific, epistemological side: Einstein extended Newton covering more exceptions (reducing the abstraction - reducing the loss).
Comment by quirino 6 days ago
Comment by jamwise 6 days ago
Comment by seethishat 6 days ago
Other forms of encryption are based on assumptions and conditions being true (e.g. factoring is a hard problem, etc.) that may or may not be true. We don't know.
Comment by janalsncm 6 days ago
Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.
Comment by frotaur 6 days ago
GPT-2 for instance achieves roughly 1 bit per byte, so it can be used to compress (english) text 8-fold. Modern models are likely much better.
Comment by briansm 6 days ago
Comment by jnovek 6 days ago
Comment by 47282847 5 days ago
Comment by jnovek 5 days ago
Comment by divbzero 6 days ago
Almost like the other Borges work where “the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire”.
Comment by aafaqzahid 6 days ago
Comment by dang 6 days ago
πfs – A data-free filesystem - https://news.ycombinator.com/item?id=36357466 - June 2023 (107 comments)
πfs – A data-free filesystem - https://news.ycombinator.com/item?id=28699499 - Sept 2021 (30 comments)
PiFS – The Data-Free Filesystem - https://news.ycombinator.com/item?id=26208704 - Feb 2021 (1 comment)
Πfs: Never worry about data again - https://news.ycombinator.com/item?id=21359338 - Oct 2019 (1 comment)
The π Filesystem for FUSE: Store Your Data in π - https://news.ycombinator.com/item?id=19223032 - Feb 2019 (1 comment)
pifs - Avoid disk space usage by saving your files in the digits of Pi - https://news.ycombinator.com/item?id=18687275 - Dec 2018 (1 comment)
πfs – A data-free filesystem - https://news.ycombinator.com/item?id=13869691 - March 2017 (105 comments)
Πfs: Stores your data in π - https://news.ycombinator.com/item?id=10856108 - Jan 2016 (1 comment)
Πfs: Never worry about data again - https://news.ycombinator.com/item?id=10847693 - Jan 2016 (1 comment)
File system that stores location of file in Pi - https://news.ycombinator.com/item?id=8018818 - July 2014 (98 comments)
100% Compression Using Pi - https://news.ycombinator.com/item?id=6698852 - Nov 2013 (32 comments)
(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)
Comment by Levitating 6 days ago
Comment by programjames 6 days ago
Comment by Levitating 6 days ago
Comment by ChrisMarshallNY 6 days ago
Comment by lukan 6 days ago
Comment by Levitating 6 days ago
I think ChrisMarshallNY is right, dang has access to eldritch powers.
Comment by khimaros 6 days ago
Comment by ChrisMarshallNY 6 days ago
I wrote a special native management app, and often use that, to implement dashboard functionality, like the kind of thing that the HN mods do.
Yeah, I could, for example, feed the logs into an LLM, and get fancy reports, but it’s a lot easier to simply hit the charts button in the navbar, and view interactive graphs, customized exactly for my workflow.
Comment by whynotmaybe 6 days ago
Comment by dang 6 days ago
Comment by emptyroads 6 days ago
Comment by dekhn 6 days ago
It didn't seem very practical.
Comment by helterskelter 6 days ago
Spy agencies would not only have to store it all in case it was something valuable, but at some point they may try to crack it because it's indistinguishable from encrypted data and waste resources on it. If enough people did it, total web surveillance could become impractical.
Comment by fc417fc802 6 days ago
I'll note that any observer already has this problem to the extent that video streams are also encrypted. However most observers presumably recognize the endpoints as well as being able to classify the traffic by means of statistical analysis.
What might be useful would be a tool to generate arbitrary user data of various forms, including HMTL, video, audio, and various message formats. Then it could assemble a convincing traffic stream full of gibberish to exchange with peers at random. You wouldn't even necessarily need all that much of it to overwhelm any would be observers when considered relative to the volume of streaming service traffic that already exists.
Comment by nairboon 6 days ago
Comment by initramfs 6 days ago
Comment by danielmeskin 6 days ago
Comment by dekhn 6 days ago
Comment by iwontberude 6 days ago
Comment by poilcn 6 days ago
Comment by agnishom 6 days ago
This is what stream ciphers are
Comment by gowld 6 days ago
Comment by adzm 6 days ago
Comment by Aloisius 6 days ago
Comment by awesome_dude 6 days ago
Comment by jastr 6 days ago
I didn’t have the compute to find my 10 digit number with the area code.
Comment by xavortm 6 days ago
Comment by mondrian 6 days ago
Comment by russfink 6 days ago
Find k candidate indices for your data, then locate each of them. If the smallest one is a significantly smaller index space, repeat.
Comment by akoboldfrying 6 days ago
Comment by jonhohle 6 days ago
Comment by Galanwe 6 days ago
Comment by 12_throw_away 6 days ago
Comment by jwpapi 6 days ago
Comment by hatthew 6 days ago
> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.
> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.
Comment by ithkuil 6 days ago
But Pi's binary expansion is not very practical for this purpose, since it's 11.0010...
OTOH. e is 10.1011...
Let's stick to fractional digits (the ones right of the binary point) at index 0 we have 1 and at index 1 we have 0.
So, to encode a stream of bytes so that each bit is encoded as the index of that bit in the e, all you need to do is to xor it with 0xFF
Comment by liamYC 6 days ago
Comment by gowld 6 days ago
Comment by jerf 6 days ago
So, instead of using pi, design an optimal number to encode with.
What you'll find is that the optimal sequence ends up being equally efficient as listing the blocks in order and indexing by block number itself. There are a number of other solutions; you could use superpermutations to get "all possible subsequences" with fewer digits in your target number, but you'll end up needing to provide the encoder and decoder a table of where the digit sequences appear since they are no longer regular and indexing into that table will cost exactly the same as just writing your number as the concatenation of all the blocks and its efficient method for indexing into them by indexing on the block rather than the digit number.
This actually has some natural overlap with the "normal numbers" in that one of the earlier normal numbers was: https://en.wikipedia.org/wiki/Champernowne_constant I'm not sure whether this is necessarily optimal for an arbitrary block size. (My quick intuitive check suggests it may be, but "my quick intuitive check" in the time of an HN post is not something I'd count on.) In this scheme, you can include the fact that the person using this constant to encode knows the nature of the constant, so they know that if you give index 0-9, it's single digit, and if you index into the two-length blocks, it must have a length of two. Since the encoder and decoder know that, they can also skip the middle of the block and just index into "the n'th number"... which degenerates into "the index of number N is N", which means this is not a compression scheme.
To put all that in a nutshell, if you want to deeply understand why this compression scheme doesn't work, I think you can attain a deep understanding of why by optimizing it.
Comment by account42 6 days ago
Comment by bandrami 6 days ago
Comment by MisterTea 6 days ago
Further reading: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System
Comment by ndiddy 6 days ago
Comment by MisterTea 6 days ago
Comment by ndiddy 6 days ago
One thing to note is that Sloot consistently refers to his scheme as "encryption" rather than "compression". His encoding scheme originated as a method to encrypt TV repair manuals for his previous project, RepaBase. The idea was that they'd send out a compressed and encrypted database of repair manuals for free, then whenever a technician needed one he would call up RepaBase and pay for the key for that manual. That way, a tech would only need to pay for the manuals he needed instead of for the whole database. The video encoding scheme was basically the same idea except the key was stored on a smart card. Of course the scammy part was misleading investors into believing that all the video data was somehow stored in that decryption key.
Comment by gowld 6 days ago
Comment by Levitating 6 days ago
But Pi is infinite. And thus this genius contraption will work as long as we have Moore's law on our side :)
Comment by giancarlostoro 6 days ago
Comment by beng-nl 6 days ago
Comment by windward 6 days ago
conjectured
Glad to see one of my pet points of pedantry come up. No non-constructed irrational number has never been proven to be normal or disjunctive.
Comment by oofbey 6 days ago
Comment by vbarrielle 6 days ago
Comment by umanwizard 6 days ago
Comment by pocksuppet 6 days ago
obviously it contains every finite digit string in base 10. I can't prove the digits are uniformly distributed in every base - you'd have to be more clever but you see the idea.
Comment by umanwizard 6 days ago
So I suppose maybe OP meant we haven't proven any number to be normal (or not) that is not designed to be normal (or not) ?
Comment by pocksuppet 6 days ago
Comment by niggischiggi 6 days ago
Comment by mkesper 6 days ago
Comment by bobim 6 days ago
Comment by mike_hock 6 days ago
It also doesn't contain all past and future knowledge because it also contains all possible falsehoods about the past and future in a way that's indiscernible from the truth.
Encoding information as an offset into a pseudorandom sequence is no more storage efficient than storing the information directly.
Comment by smaudet 6 days ago
Infinities of random sequences exist that can be shown not to contain all data, 0-8 (base 10) is one such random sequence that is trivially proven to never contain 9...
There are no known patterns to pi, but, (I am legitimately curious about this), are there any known sequences e.g. of 1 million 0s and a single other digit within the decimal sequence of pi?
Given how it (pi) looks, I'm of the strong suspicion is that the answer is "no". But of course, proving that requires that some property of the randomness is provable. Which it does feel as if, given there are different infinities, there are also different randomnesses, hence the conjecture is ill-formed and probably incorrect...
Comment by gowld 6 days ago
Comment by nosioptar 6 days ago
(Fun fact: "Chrispratt" is an ancient Californian word that means "Joel McHale didn't want the role.")
Comment by arialdomartini 6 days ago
https://dn760100.eu.archive.org/0/items/TheLibraryOfBabel/ba...
Comment by teapourer 6 days ago
Comment by matheusmoreira 6 days ago
All knowledge is information. All information is sequences of bits. All sequences of bits are numbers. All numbers already exist.
All files in a computer are sequences of bits. Intellectual work creates files. Intellectual work is number discovery.
Humans are interesting number generators. Humans are anti-random number generators.
Comment by xp84 6 days ago
Comment by Yokohiii 6 days ago
Perfect crypto!
Comment by OkayPhysicist 6 days ago
Comment by thih9 6 days ago
Comment by vadansky 6 days ago
Comment by skulk 6 days ago
Comment by cadamsdotcom 6 days ago
Comment by deadbabe 6 days ago
Comment by nighthawk454 6 days ago
Comment by RetroTechie 6 days ago
Comment by koolala 6 days ago
Comment by layer8 6 days ago
Comment by Lalabadie 6 days ago
Comment by utopiah 6 days ago
Are you sure? It's been a while since you last opened it. Memory is funny like that. The file is fine — maybe take another look with fresh eyes."
from https://github.com/philipl/inferencefs/
Maybe I do not indeed remember properly. Anyway, back to watching "Eternal Sunshine of the Spotless Mind" for the first time, I think.
Comment by matneyx 5 days ago
Comment by aidenn0 6 days ago
Comment by partsch 6 days ago
Comment by nyc_pizzadev 6 days ago
https://github.com/philipl/pifs/blob/fded8bf7b8f4fc64233e37b...
Comment by layer8 6 days ago
Considering each individual bit separately would be even more performant: you only need the indexes 2 and 33, and there is an efficient mapping of those to the bits in storage.
Comment by hnlmorg 6 days ago
I’m guessing this is something that could be formally proven?
Comment by hasteg 6 days ago
Comment by partsch 6 days ago
Comment by simonreiff 6 days ago
Comment by stackghost 6 days ago
Comment by liglam 6 days ago
Comment by hnlmorg 6 days ago
Comment by mike_hock 6 days ago
Comment by pixel_popping 6 days ago
Comment by anon291 6 days ago
Comment by giancarlostoro 6 days ago
Comment by pokstad 6 days ago
Comment by thangalin 6 days ago
> Matches that occur early enough in π to attain significant compression will not be varied. That is, it isn't possible to use π to compress interesting, real-world data because real-word strings are unlikely to arise early.
Comment by Levitating 6 days ago
> Calculate the number of bits to encode that value using log2(938933556), which is ~29.8
Can someone explain these two statements to me?
Comment by csunoser 6 days ago
This is roughly same as saying: "If you rewrite 938933556 as a binary number / usize, it will need 30 bits".
Sanity check: 1101111111|0110111111|0100110100 (| delimits every 10 bigits).
> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.
This statement is a bit more subtle. As a first ord approximation, we can see pi sort of as a RNG.
If we write pi (ignore the decimal point), as a binary number, we get: 11011001111111011110010101011110001010101111101101110001001100001...
You can... kind of squint and pretend this is a random sequence of 1s and 0s.
Now, if you had a file that is 128 bits (so lots of intermingling 0s and 1s), and each next digit of pi is effectively a coin flip. Pretend 1s are heads, and 0s are tails. You basically have to get the exact 128 consecutive coin flips of the same result as your file to get your file back.
Imagine now, PI not as a number, but a sequence of experiments of flipping the coin 128 times.
- (11011..01000)(10000...00100)....
- ^attempt 1 ^attempt 2
You have to try, on expectation, quite a few times to win this game! Now, you could easily get lucky for sure. But on average, your chance of winning per attempt is roughly 0.5^128! So, how many times do you have to try to win this game? Something like 2^128 times - and you have to consider that each attempt uses 128 bits as well. So more like 2^135. But you don't have to start fresh in each attempt, you can see it as like this: - 11011................00100...
- ( 128 flips )
- ( another 128 )
- ( )
- ... so on and so on
That's where the 2^128 number came from.Comment by thangalin 6 days ago
Comment by koolala 6 days ago
0x123456789ABCDEF0
use this number as a shorter nibble storage alternative...
Comment by vbarrielle 6 days ago
Comment by markcollins05 2 days ago
Comment by tptacek 6 days ago
Comment by kevinmiller452 2 days ago
Comment by adzm 6 days ago
Comment by cbm-vic-20 6 days ago
jshell> "πfs".toUpperCase()
$1 ==> "ΠFS"
Welcome to Node.js v26.3.0.
Type ".help" for more information.
> "πfs".toUpperCase()
'ΠFS'
Python 3.14.5 (main, May 10 2026, 10:21:34) [Clang 21.0.0 (clang-2100.0.123.102)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "πfs".upper()
'ΠFS'
echo 'πfs' | awk '{print toupper($0)}'
ΠFSComment by noman-land 6 days ago
Comment by baalimago 6 days ago
If our universe is simulated, it must be possible to snapshot the entire state for one iteration (however time now is quantized, open question). "... From here, it is a small leap to see that if π contains all possible files, why are we wasting exabytes of space storing those files, when we could just look them up in π!" (from pifs, above)
This means that not only does a singular snapshot of our universe exists in pi, but every single one does
The information for our entire universe's simulation is stored in pi (and every other number like it)
Comment by foxes 6 days ago
Simulation "theory" isnt a theory its a conjecture.
Theres no meat to it.
Comment by gosub100 6 days ago
The word 'exist' is doing a lot of work here. Could any computer actually find the value in pi? Each computation takes energy, and there is a finite amount of energy in the universe. Does the value 'exist' in pi if it could never be rendered? How much different is that claim from "our simulated universe is a file on gods computer, located just past the edge of the observable universe or 18 inches away in the 4th dimension"?
A similar thought experiment I've had is with the lottery. With just a few sheets of paper, there exist a sequence of numbers that would completely shut down both major US lotteries - ossibly even get you arrested - if they contained the winning numbers for the next 50, 20, or even 10 consecutive draws. Think about the consequences. You would win, and win again the next draw, and they would be certain you cheated. Then confiscate your pad, and draw again and win a third time. They would have to shut the contest down because this is "mathematically impossible ". But it's not. Just like your thought experiment, it's "just numbers"
Comment by Lorin 6 days ago
Comment by sam_goody 6 days ago
Cache all the last lookups but otherwise just store the index within pi? And for larger files - split them into chunks of whatever size could be handled?
(I mean, I realize this is a joke and can't make sense - but GPUs can be really really fast, and am willing to make a fool of myself by asking.)
And if we had a quantum computer that stores all of pi on one qubit, that could make things even faster ;/
Comment by z3t4 6 days ago
Comment by charles_f 6 days ago
My favourite issue being about GDPR compliance https://github.com/philipl/pifs/issues/56
Comment by keithnz 6 days ago
Comment by chris_sn 6 days ago
Comment by notatyrannosaur 6 days ago
Meta: every single comment seems to start with some variation of "Reminds me of". Had to get mine in.
Comment by golem14 6 days ago
Not even sure if there an interesting Collatz-like conjecture here.
Comment by woah 6 days ago
3._1_415926535897932384626433832795_0_288419716939
Comment by torh 6 days ago
Comment by outadoc 6 days ago
Comment by hnbad 6 days ago
Comment by actusual 6 days ago
Comment by dekhn 6 days ago
Comment by markcollins05 6 days ago
Comment by ctan4 6 days ago
Sing, the wrath. Rendering in LaTeX.
Comment by adamwright326 5 days ago
Comment by amelius 6 days ago
Comment by yassi_dev 6 days ago
Comment by glitchc 6 days ago
Comment by wavemode 6 days ago
> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.
> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.
Comment by bean469 6 days ago
Comment by mike_hock 6 days ago
Comment by amelius 6 days ago
And for which the index is easy to compute?
Comment by leephillips 6 days ago
Comment by yason 6 days ago
Comment by amluto 6 days ago
> Well, this is just an initial prototype, and don't worry, there's always Moore's law!
Seriously? They're only storing individual bytes in pi:
> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.
So the whole transformation should be trivially reducible to a 256-element lookup table from source byte to location in pi and a similar table used to convert back the other way. Maybe a fancy formula could be used for the (never actually encountered) case in which a byte is encoded by one of the infinite available noncanonical encodings.
Comment by bilsbie 6 days ago
So not really a compression scheme.
Comment by j3th9n 6 days ago
Comment by psadri 6 days ago
Comment by mohsen1 6 days ago
Comment by X-Ryl669 6 days ago
Comment by liamYC 6 days ago
https://ljsimpkin.github.io/pi-compress
It really shows how inefficient such a compression would be. Haha nice idea
Comment by stogot 6 days ago
Comment by keyle 6 days ago
Comment by adamwright326 6 days ago
Comment by jklimosk 6 days ago
Comment by 0x1ceb00da 6 days ago
Comment by dofcof 6 days ago
Comment by Levitating 6 days ago
Comment by dwheeler 6 days ago
Comment by mzelling 6 days ago
I mean, I get that it's "fun" to store information within the digits of pi. But is this just amusement, or is there a value prop for production use here?
(Speaking as a math major, by the way. I'm sympathetic to the cause.)
Comment by windward 6 days ago
This project makes clear the counter-argument: the input that gets you the file out of π is a badly compressed version of the file.
Comment by tcoff91 6 days ago
Comment by mherkender 6 days ago
Comment by aafaqzahid 6 days ago
Comment by sonixaep 6 days ago
Comment by RedMagicBox 6 days ago
Comment by yamakasi007 6 days ago
Comment by RedMagicBox 6 days ago
Comment by spchampion2 6 days ago
Comment by gatestone 6 days ago
Comment by insumanth 6 days ago
Comment by Lapsa 6 days ago