Functional Quadtrees

Posted by lbj 6 days ago

Comments

Comment by lemonwaterlime 6 days ago

I like to do data-oriented programming, and was just thinking about how I want to organize (and search through) the primary data structures/concepts for a project I'm working on. Part of that involved thinking about things like what information I might cache and what representations data might take. That lead me to looking into the nuances of things like B-Trees, AVL Trees, Quadtrees, k-d trees and so forth.

I've found the book "Foundations of Multidimensional and Metric Data Structures" by Hanan Samet to be an excellent resource when looking for a slightly deeper dive than a more introductory algorithms course. It goes in depth on the nuances of these approaches, many of which are highly similar at a cursory glance.

Comment by andoando 6 days ago

Dammit why these books have to be $60

Comment by incognito124 6 days ago

Just ask anna if she has it in her archive

Comment by dswalter 6 days ago

I see USD $36 on ebay, used. It's a smaller barrier to entry.

Comment by KPGv2 5 days ago

because they have low demand, meaning they would lose money if the price were lower

Comment by marvinborner 6 days ago

Quadtrees are also quite useful for generating fractals. A very related project of mine, Lambda Screen [0], explores this by encoding these functional quadtrees directly in lambda calculus and rendering the structure based on Church booleans being true (white) or false (black).

With fixed point recursion, this allows for very tiny definitions of IFS fractals. For example, fractals like the Sierpinski triangle/carpet only require ~50 bit of binary lambda calculus [1] [2]!

[0]: https://text.marvinborner.de/2024-03-25-02.html

[1]: https://lambda-screen.marvinborner.de/?term=ERoc0CrYLYA%3D

[2]: https://lambda-screen.marvinborner.de/?term=QcCqqttsFtsI0OaA

Comment by torusle 6 days ago

"I could only find a couple tutorials/guides and both were imperative"

Aren't Quadtrees covered by almost all basic data-structure books? It is the most simple form of taking the binary tree into the next (2D) dimension.

Comment by zelphirkalt 6 days ago

The problem is rather, that most data structure tutorials and books don't even get the idea, to introduce a purely functional version, but merely state the imperative versions. Coming up with the functional versions of data structures can be difficult. Papers about it can be hard to understand and often require one to already know some niche language, that a researcher used for the paper. Even if you can find a functional implementation of a data structure, there is often not a good explanation and you need to, sort of, reverse engineer it and translate it to the language you are using.

In short, it seems relatively few people have the skills to implement them and even fewer have the skills to come up with functional versions of ordinary data structures. They are almost completely absent from university lectures as well, as far as I am aware. For example for AVL trees I could only find a document from ETH from a lecture, that no longer exists or is taught. The language is Isabel and I need to understand its syntax first, before being able to translate it to Scheme.

If anyone has an obscure source for implementations one can learn from and implement oneself in another language, please share.

Comment by nanomonkey 5 days ago

I've been looking for the same for scheme and clojure. Here are a few I've found:

Functional Data Structures and Algorithms, A Proof Assistant Approach by Tobias Nipkow (Ed.) [https://fdsa-book.net/functional_data_structures_algorithms....]

Purely Functional Data Structures thesis by Chris Okasaki [https://www.cs.cmu.edu/~rwh/students/okasaki.pdf]

https://en.wikipedia.org/wiki/Purely_functional_data_structu...

Comment by zelphirkalt 5 days ago

[1] are the exercises with solutions (!) of a lecture [2] at ETH Zurich, which include AVL trees in Isabel. The whole list of exercise PDFs is at [3]. I started porting the AVL trees to Scheme [4], but lately have not worked more on this. I also try to make it easy to understand the implementation by providing explanatory comments.

[1]: https://archiv.infsec.ethz.ch/education/permanent/csmr/exerc...

[2]: https://archiv.infsec.ethz.ch/education/permanent/csmr.html

[3]: https://archiv.infsec.ethz.ch/education/permanent/csmr/exerc...

[4]: https://codeberg.org/ZelphirKaltstahl/guile-data-structures/...

Comment by KPGv2 5 days ago

Okasaki's is basically the Bible for this stuff. Anyone writing data structure libraries in a functional language will have read this or have it on their to-read list.

Comment by zelphirkalt 5 days ago

I have the book, but it also doesn't contain that many data structures. Some of the maybe most used (AVL tree?) are not in there.

Not all exercises have solutions. I get stuck on some exercise and have a hard time finding solutions to them, that I can compare with my implementations in Scheme. Not being a Haskell user (yet), I also have some issues translating Haskell to Scheme. In languages like Haskell one often controls flow by pattern matching on type. In Scheme one needs to make structures or records explicitly and use their predicates to explicitly check the type. It's all possible, but not exactly great for learning. Sort of the road has many sticks and stones to stumble.

Comment by pixelpoet 6 days ago

You can even build them with basically one line of code by sorting points using Morton / Z-curve order. It's linear time if you use a counting/radix sort.

Edit: lol, downvoted for this post. Never change, HN.

Comment by quibono 6 days ago

I'd love to see that. Could you link me to an implementation or explain this in more detail please?

Comment by _jackdk_ 6 days ago

Here's a 3D version used in the creation of sparse voxel octrees:

https://forceflow.be/2013/10/07/morton-encodingdecoding-thro...

Here's an example from AWS, where lat/long pairs are put into a Z-index, which is used as a DynamoDB sort key, letting you efficiently query for items near a point.

https://aws.amazon.com/blogs/database/z-order-indexing-for-m...

Comment by johnisgood 6 days ago

I would like to know about this more, too. Is there a code anywhere, ideally with comments? But I am fine without comments, too, I would just like to see the code and possibly with an example usage.

Comment by quibono 6 days ago

Okay, I was intrigued and I did some digging. Morton / Z-order is all about interleaving the individual bits of the x and y coordinates. You end up grouping by quadrants. Python one liner:

    points.sort(key=lambda p: sum(((p[0]>>i&1)<<(2*i))|((p[1]>>i&1)<<(2*i+1)) for i in range(16)))

Comment by rdtsc 6 days ago

Yeah good point, they are downsides for sure but it's a simple enough approach and most of all it can be shoved in a database (or b-tree or any 1d-sorted data structure).

And for a z-curve, the order is basically a depth-first traversal of a quadtree.

Comment by pavlov 6 days ago

It’s super easy to click the downvote button by accident on mobile when you meant to upvote. And this UI will never be fixed because this is HN after all.

Comment by proc0 6 days ago

I just implemented an HN UI. This is good feedback as I'm aiming to have a mobile friendly web version.

https://proc0.github.io/HackerZen (it's also open source)

Comment by 5 days ago

Comment by craftkiller 6 days ago

This is precisely the reason that I do not log in to HN on my phone. My phone is read-only and if I want to upvote or comment then I have to switch to my laptop. Pretty easy with firefox because I can send tabs to other devices.

Comment by jasonjmcghee 6 days ago

Zoom in or unvote/revote

Comment by kristianp 5 days ago

This is true, however you can see if you downvoted by the "undown" link displayed instead of "unvote".

Comment by acters 6 days ago

That is why I like harmonic app, there is an invite button separating the upvote and downvote. Never going to have this kind of issue

Comment by OisinMoran 6 days ago

Neat! Weirdly sending this article from my phone (Pixel 8) to my browser (Arc) via Pushbullet resulted in an incredibly strange bug that it loads this site instead:

https://www.lindelystables.dk/en/posts/functional-quadtree-c...

Got very confused! I challenge the HN hivemind to figure out what's going on.

Comment by lbj 6 days ago

Apologies! I think I might have a found an eager redirect on the server. I haven't been able to reproduce, but you're not the first to report it. I hope it's fixed now.

Comment by mutkach 6 days ago

I remember Arc randomly rewriting my bookmarks using some kind of summarization model or something like that, it also sometimes changed the name of downloaded files reinterpreting their names. Maybe it is related somehow.

Well, I guess it was the first AI-first browser, hence all this bs. I uninstalled it months ago...

Comment by OisinMoran 6 days ago

Yeah that feature was nice at the start then got very annoying. Still like the browser though. Weirdly enough chrome on my phone has been reporting the wrong URLs, usually one I've just been on.

Comment by runemadsen 6 days ago

We just did a whole visual identity around the quadtree concept. Take a scroll on this one! https://trace.systems/

Comment by wenc 6 days ago

That is a cool visualization of a quad tree. I use quadtrees in geospatial applications (to partition lat longs) but this is the time I’ve seen it used to render a photo.

Quad trees are abstract until you see what they look like. It’s a clever method to partition 2D points.

(Kd trees are even better)

Comment by CyberDildonics 6 days ago

a whole visual identity around the quadtree concept

What does that mean?

Comment by dwb 6 days ago

That all sounds incredibly dystopian, ugh.

Comment by Waterluvian 6 days ago

I love the visualization, which gave me an idea: what if we numbered every "Looking at" step in the visualization? Then it's obvious just how many search steps it takes.

And then maybe even juxtapose that with a linear search example, which also numbers every step. I bet this would make it really click for some people. And for free the user can also play with how a linear search can sometimes be faster when they just want the first element!

As a bonus: allow the user to change the cell count so they can really feel just how each method scales!

Comment by catapart 6 days ago

I appreciate that this writeup takes care to call out use cases when they help with understanding!

I do have a semi-unrelated question though: does using the recursive approach prevent it from being calculated efficiently on the GPU/compute shaders? Not that it matters; plenty of value in a CPU-bound version of a solution and especially one that is easy to understand when recursive. I was just wondering why the prominent examples used a non-recursive approach, but then I was like "oh, because they expect you to use them on the GPU". ...and then I was like "wait, is that why?"

Comment by cpgxiii 5 days ago

> I do have a semi-unrelated question though: does using the recursive approach prevent it from being calculated efficiently on the GPU/compute shaders?

Historically speaking, the use of recursion in shaders and GPGPU kernels (e.g. OpenCL "C" prior to 1.2) was effectively prohibited. The shader/kernel compiler would attempt to inline function calls, since traditional GPU models had little in the way of supporting call stacks like normal CPU programs have, and thus recursion would be simply banned in the shader/kernel language.

Comment by djmips 6 days ago

Non recursive approaches can help on the CPU as well. It's just easy and elegant to do function recusion but not necessarily faster.

Comment by willvarfar 6 days ago

A general quadtree implementation question that puzzled me when I was implementing it myself for hobby games was: do you store a rectangle in the smallest node that completely contains it?

Most code that I saw that used quadtrees were treating things as points and storing them only at the lowest level.

I also made mine auto-divide by counting items that are entirely in a quadrant as they are added to the node, with allocate and split triggered if a count went above a certain threshold.

Anything novel or oopsie?

Comment by Karliss 6 days ago

One of the reasons you mostly saw point operations and very few rectangle operations is because quadtrees aren't great for range operations.

Quadtrees might look like natural generalization of binary trees, but some things that work very efficiently in binary trees don't work for naive generalization of binary trees into quadtrees. For a full binary tree with W leaves any segment can be described using log(W) nodes. Almost every use of binary tree more interesting than maintaining sorted list of numbers depends on this property. What happens in 2d with Quadtree, how many of quadtree nodes are necessary to exactly describe arbitrary rectangular area? Turn's out you get O(W+H) nodes, not log(N), not log(W)*log(H).

If you stored a rectangle in smallest node that completely contains that avoids the W+H problem during insertion operations, but during read operations you may end up with situation where all the information is stored at the root. This is no better having no quadtree at all and storing everything in plain list that has no special order. Such worse case scenario can be created very easily if every rectangle contains centre of area described by quadtree, then all rectangles will be placed at root.

Dynamically subdividing and or allocating tree nodes on demand isn't anything particularity novel. If anything the cases where you can use fixed sized static trees outside textbook examples is somewhat a minority. Not saying there are no such cases there are enough of them, but large fraction of practical systems need to be capable of scaling for arbitrary amount of data which get's added and removed over the time and total size is not known ahead of time and the systems need to be able to deal arbitrary user data that has maliciously crafted worst case distribution. Every B-tree will split leaves into smaller nodes when necessary. Almost every self balancing binary tree can be considered of doing automatic division just with threshold of no more than 1 item per node. For 2d many uses cases of k-d tree will also subdivide on demand.

Comment by pengaru 6 days ago

When I made a quadtree for some simple 2d games I handled AABB areas, since it was aimed at broad-phase collision detection of what were essentially sprites just rendered w/GL.

Similar to yours I split the leaf nodes when they became too full, with some simple fixed threshold defining "full". Only leaf nodes contained references to the indexed objects, and all overlapping leaf nodes would reference the objects they overlapped. Search queries were done also using an AABB, and would iteratively invoke a callback for all overlapping object AABBs found in the overlapping leaf nodes. IIRC the object references hanging off the leaf nodes had a fixed number of linked list slots to be put on a results list during a search, to deduplicate the results before iterating that list with the provided candidate-found callback. Since any given indexed object could be on many leaf nodes in the index, if they spanned a large area shared with a high density of other indexed objects for instance.

It was up to the callback to do the narrow-phase collision detection / control the results list iterating stop vs. continue via return value.

I recall one of the annoyances of sticking the search results linked list entry in the object references hanging off the leaf nodes was it set a limit to the number of simultaneous searches one could perform against the index. Basically the game would initialize the index with a fixed maximum concurrent number of searches to handle, and that set the number of results-linked-list slots the object references would be allocated to accommodate. As long as that was never exceeded it worked great.

It's been a while so I may have gotten it wrong, but that sounds right to me.

Not sure what the most common approaches are...

The C source is @ https://git.pengaru.com/cgit/libix2/.git/tree/src/ix2.c

Comment by CyberDildonics 6 days ago

In 3D this has to be dealt with in the form of polygons and I think it was common when people were using acceleration grids and kd-trees to split the polygons so they fit neatly.

That being said most ray tracing seems to do bounding volume hierarchies now, so maybe a bvh is the best way to deal with things that have volume.

Comment by wiz21c 6 days ago

I think it is weird to have two cells divided downto their smallest size when my cursor clearly occupies only one of them, not two.

Comment by senderista 6 days ago

I wish the article had made it clearer that quadtree positions are encoded as strings over the alphabet on 2 bits (similarly, octrees use the alphabet over 3 bits). This makes storing keys and lexicographically comparing them very simple.

Comment by almostgotcaught 6 days ago

> positions are encoded as strings over the alphabet on 2 bits

This is the most pedantic way of saying "binary 2-tuples" I've ever seen. Also for quadtrees this is inferior to base 4 because you can assume clockwise (or counter) ordering.

Comment by mkehrt 6 days ago

I don't think that's what they meant. It's the case you can use literal strings of bits to encode a (2^n)-tree node, so you use actual bitstring comparisons and operations to manipulate them. Rightshift gives you the parent and things like that.

I don't think this is something the article cares about, though.

Comment by senderista 5 days ago

Thank you, that is exactly what I meant.

Comment by senderista 5 days ago

Uh, base 4 is exactly what I meant. I guess I wasn't very clear that I mean positions are encoded as bitstrings, with one pair of bits for each level (and triples of bits for octrees). Is that clear enough for you?