Show HN: Subth.ink – write something and see how many others wrote the same

Posted by sonnig 4 days ago

Counter92Comment51OpenOriginal

Hey HN, this is a small Haskell learning project that I wanted to share. It's just a website where you can see how many people write the exact same text as you (thought it was a fun idea).

It's built using Scotty, SQLite, Redis and Caddy. Currently it's running in a small DigitalOcean droplet (1 Gb RAM).

Using Haskell for web development (specifically with Scotty) was slightly easier than I thought, but still a relatively hard task compared to other languages. One of my main friction points was Haskell's multiple string-like types: String, Text (& lazy), ByteString (& lazy), and each library choosing to consume a different one amongst these. There is also a soft requirement to learn monad transformers (e.g. to understand what liftIO is doing) which made the initial development more difficult.

Comments

Comment by NitpickLawyer 4 days ago

Your thought's hash is: 7456a8269266134d67e9e0b2b26dbbc2227ba976add87c05e91e4cc9937b8b21 You are the first person with that thought. Congratulations!

"You are absolutely right!"

Well, at least we know claude didn't hit the API yet :)

Comment by lumirth 4 days ago

I said "I love my wife". Apparently, I was the first. Then I said "penis". I was the fifth.

Neat!

Comment by NewJazz 4 days ago

Hey that's my wordle opener!

Comment by 4 days ago

Comment by away0g 4 days ago

i said penis

Comment by sonnig 4 days ago

Me too, and other 16 users

Comment by wellpast 4 days ago

95 other users*

Comment by susam 4 days ago

Some of the top items:

  hello world
  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52

  hello
  4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52
  406cc6dbc566bf6c672a2167868341e9853f7fbbd2a21eb1caa4d08006abae41

  hi
  661ce2e5ed28422eb8b51ec2a217c976e05e37713246166e8fcbf67be4824380

  test
  83d34c0abee918ed3edf585b6cb8ce97fe8286027b012bacdfa71b967924f9b2

  a
  beef7c4d3141c30ab4f6ebf1f724936c50f609ee1915951d802046ba1d9fa23d

  subth.ink
  3f3b05abaec959c9950d5a93a64525971c7d9fcabf6436d653edba62f29d5bea

  lol
  39567a3cc35a4c68d72d01beac88414d0ced5c20b437ff9bc6e2cb20615a47b7
Thanks to Y@Y for 4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52.

Comment by Y_Y 4 days ago

Currently number three is:

  4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52
with hash:

  406cc6dbc566bf6c672a2167868341e9853f7fbbd2a21eb1caa4d08006abae41
i.e. the hash of "hello world"

Comment by castalian 4 days ago

fedb9943d8c4c51392815a187ce4ba732c539038fd28b4bda8543e4616d767c1

the nword

Comment by sonnig 4 days ago

Apologies for that, I have removed it.

Comment by platybubsy 3 days ago

Thanks, just seeing the hash made me literally shake

Comment by sonnig 3 days ago

Yeah me too

Comment by donbale 4 days ago

This is great, but never seems to say that its an original thought always defaulting to: "Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago."

Comment by sonnig 4 days ago

I've changed this now, thanks for the feedback

Comment by Apreche 4 days ago

This would be more interesting if it was generalized. Using a hash, even one character difference will result in a miss.

If I could have it analyze my blog and then find people who have similar ideas that would be incredibly useful.

Comment by Imustaskforhelp 4 days ago

To be really honest, they can take a look at bao. (I used it for an eerily similar project like this one though its great that this is receiving traction! I Do feel like scuttlebutt protocol might be good implementation for most use cases as well)

Bao allows us to have a common hash for the first n contents of the term and then they can still have common hash so you can just loop it over each continuous word to see how much commonly (long?) their hash is and the length becomes the amount similar

Some issue might come where if the word changes in the start and the rest is similar but I feel like bao could/does support that as well. My information on bao is pretty rusty (get the pun? It's written in rust) but I am sure that this idea is technically possible & I hope someone experienced in the field could tell more about it

https://github.com/oconnor663/bao, Oconnor's bao's video or documentaries on youtube are so good, worth a watch & worth a star (though they do mention that its a little less formally cryptographically solved iirc but its still pretty robust imo)

Comment by sonnig 4 days ago

True! That would be a more powerful approach. Here I kept it quite basic since I was not very familiar with the tooling. I do apply lowercasing of text + some whitespace stripping in order to increase the number of collisions a bit.

Edit: any other "quick hacks" to increase the number of collisions are welcome :)

Comment by nathan_compton 4 days ago

Natural to use LM embeddings for this.

Comment by jamilton 4 days ago

Yeah, convert to embedding, check if it's within a certain distance to an existing embedding and if so store it with that cluster and increment? Then check check further entries against against an average so clusters don't increase their "reach" indefinitely.

Comment by stogot 4 days ago

That is a problem Also a long paragraph would likely never be hashed the same because of a comma or capital letter and so the builder of this would need to cap the length of the thought and make all thoughts lower case without punctuation

Comment by sonnig 4 days ago

i agree removing punctuation wouldve been a good idea alas it may be a bit too late since that would modify the hash of previous inputs in the future hmm but i will think about it

Comment by pvdebbe 4 days ago

I love this. Shouting into the void with the distinct feel, hope that if the idea was popular enough, it'd be brute forced back to existing.

I noticed that the input is not being treated any way before hashing. I'd remove all non-letter characters, and then lowercase everything before hashing to help with some unnecessary misses.

Comment by wellpast 4 days ago

Someone might run:

curl -s https://www.cs.cmu.edu/~biglou/resources/bad-words.txt | tr -d '\r' | while read -r w; do curl -s -X POST https://subth.ink/api/thoughts -H 'Content-Type: application/json' -d "{\"contents\":\"$w\"}"; done

Comment by throwawoy 2 days ago

I have managed to crack 4 words out of the top 10:

- #5 6f18270a4ed02a134851520202a104a33721423c35b7f5421e0081ec732793b1 - sex

- #6 8ababd402810b2a412142dcc71ea3083ccbab886c48a48948d286eee161c72ad - hash of fedb9943d8c4c51392815a187ce4ba732c539038fd28b4bda8543e4616d767c1

- #7 4ef69019c65909ffbb470597e3c5afe05ea8a866a0d3b9f950f0bcf057924b52 - hello world

- #10 dad6326d44d6f94c5668be9e5a5b762415fc2a3097d47742a7cf62d37e5e8287 - hash of 8ababd402810b2a412142dcc71ea3083ccbab886c48a48948d286eee161c72ad

The first one is probably brute force by a single person since it is at a round 10,000.

Comment by abnry 4 days ago

I love it!

I typed "hello".

> Your thought's hash is: 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

> Including you, 267 persons had that thought already!

> First time was 4 hours, 14 minutes ago, last time was less than a minute ago.

Of course, everyone else has thought of this. But what if I "type": 4358f43b660389eecd435dc2a5f5cee29786245cd2cff27bd4de0b3e8fd53b79

> Your thought's hash is: c37d0a8c512b9ec7074d3bc77c4545d58fdfcde55bad89a70ede71ac2ac0000d

> Including you, 8 persons had that thought already!

> First time was 2 hours, 1 minute ago, last time was 1 minute ago.

That's hilarious!

And also, "typing": echo "hello world" | curl -d @- https://subth.ink

>Your thought's hash is: c5ba1c7e35345dbb8c2dc6be0972d0b6ddf6c6515143b64c057296948e2ba8cd

>Including you, 10 persons had that thought already!

>First time was 1 hour, 52 minutes ago, last time was 2 minutes ago.

Comment by throwaway89201 4 days ago

> It (the MD5 hash) might be published in the future when a thought's count passes a certain threshold (TBD). This might make it possible to recover certain short thoughts that were popular.

This makes little sense. Recovering a random preimage of an MD5 hash is marginally easier [1] than a (128-bit truncated) SHA256 hash, but this won't recover any sensible message.

Recovering a sensible (short) message is equally hard for both hashes.

[1] https://link.springer.com/chapter/10.1007/978-3-642-01001-9_...

Comment by nullchan 4 days ago

"helloworld"

Your thought's hash is: 06ad246627b5f973559a1dbcf2a6b96791d9b15ed2d8cb45c344f98b14d10f76 Including you, 1 person had that thought already! First time was less than a minute ago, last time was less than a minute ago.

haha, cool.

Comment by patapong 4 days ago

Neat idea! I love this kind of low-stakes online interactions, a bit like 1,000,000 checkboxes as well - makes me realize how many others there are out there and invokes a strange but nice feeling of community :)

Comment by 4 days ago

Comment by Dilettante_ 4 days ago

  Your thought's hash is: 295c1f32c2fa180b5425c2b502e1d3968a7639c8ec398d66ec2e4ff73c05a1ea
  Including you, 2 persons had that thought already!
You guys know who you are

Comment by g105b 4 days ago

I think a few more than 2 people have said that now.

Comment by jjpones 3 days ago

I'm the 18th meow :3 Honestly, think you so much for posting this. I love small and fun projects.

Comment by internet_points 1 day ago

For strings, just add string-conversions to your .cabal and `import Data.String.Conversions` and you can use `cs` to convert between the five, it'll typically figure out from context what type you need. E.g.

    import Data.String.Conversions
    import Data.Text qualified as T
    
    main = do
      url <- readLine
      thing <- fetchFromInterwebs (cs url)
      T.putStrLn (cs thing)
I don't think you need to really understand monad transformers to use them, just know that sometimes you're in "SomethingThatBuildsOnIO" and so you need to liftIO when using IO things.

Comment by vlfig 4 days ago

Next step: embeddings and similarity.

Comment by Paracompact 4 days ago

"I am happy"

Including you, 7 persons had that thought already!

"I am sad"

Including you, 9 persons had that thought already!

Comment by SwiftyBug 4 days ago

I was very susprised to not be the first to enter a quotation by Nelson Rodrigues. Nice.

Comment by eigenblake 1 day ago

bd35a7f69b28c97fb3ebe489a4fba26a5f423522276d5ff5b5a8bb6441806ad2

Comment by purrcat259 4 days ago

wow seven people have the same password as me

Comment by JoshTriplett 4 days ago

Fun idea. One potential issue: the same person writing the same text repeatedly will count more than once, so it'd be pretty trivial to spoil the rankings. (This is why we can't have nice things.)

Comment by Imustaskforhelp 4 days ago

Another comment here as I got way too much excited in the other one, but this is genuinely so good man!!! KUDOS!

It actually provides a simple curl command. Oh boy, this does open up a few more ideas. I feel like my wall of text -> link shortener / blog and all other comments on that wall of text being comments themselves might be implemented & this does open up to a lot of possibilities

I actually got a vps of like 8 gigs 4 cores 500 gigs ssd for 3 months prepaid and I snatched it during a recurring deal.

If you want, i can transfer it to you or share half the resources or similar to you if this project ever needs one.

One of the most interesting things is that this (unlike my idea which was just a "proof" if it was possible in a more complex environment) actually does make it simple and for normal devs to build upon

You are mentioning scotty, and I am not sure if you mention scuttlebutt the protocol or as if scotty is some haskell web framework (sorry don't know haskell)

What are your thoughts on scuttlebutt or (nanotimestamps), I have it open source under the MIT license for anyone to build on top of it with.

Your project's really polished and I admire it but I would hope that you can look more at the decentralization side of things because one of the ideas I had which never got to fruition was that adding on top of it, we can just have a social media similar to nostr but without the relay mess that nostr has in many instances (or so I have heard)

I am curious as to what are some use cases you are thinking of it as I'd love to know your opinion on it!

Have a nice day man!

Comment by joeframbach 3 days ago

This concept is a duplicate :) we already had r9k

Comment by sonnig 2 days ago

Yep, it's similar in that way, but not in a imageboard/discussion context

Comment by moontear 4 days ago

I was the first one with "here be dragons"? come on

Comment by 4 days ago

Comment by Rygian 4 days ago

I was the first to state that, contrary to popular belief, "the quick brown fox did not jump over the fence"

Comment by flufluflufluffy 3 days ago

Popular belief holds that the quick brown fox jumped over the lazy dog, not the fence.

Comment by IamDaedalus 4 days ago

someone declared bankruptcy before me in the office style

Comment by metatronzero 3 days ago

"67" Your thought's hash is: 098754435bbbe041e9beb5d99e28d8256ad1d064f768332a976ffa6083b535c2 Including you, 31 persons had that thought already! First time was 20 hours, 43 minutes ago, last time was 2 minutes ago.

lol

Comment by silcoon 4 days ago

80f9d25eb732197e10d71597dca181e7a454eeda3cc484b1c3e129109b41db23

Comment by ta988 4 days ago

this one is going up fast, no wonder

Comment by 4 days ago

Comment by MuddyTortoise 4 days ago

[dead]

Comment by Imustaskforhelp 4 days ago

[flagged]

Comment by poly2it 4 days ago

Maybe you should submit it?

Comment by Imustaskforhelp 4 days ago

Done. thanks for the suggestion.

https://news.ycombinator.com/item?id=46684789 [Nanotimetamps: Time-Stamped Data on Nano Block Lattice]

Comment by Jana3 4 days ago

[flagged]

Comment by Clark3232 4 days ago

[flagged]