Async DNS
Posted by todsacerdoti 2 days ago
Comments
Comment by albertzeyer 2 days ago
In that discussion, most of the same points as in this article were already discussed, specifically some async DNS alternatives.
See also here the discussion: https://github.com/crystal-lang/crystal/issues/13619
Comment by frumplestlatz 2 days ago
We knew it was a bad idea at the time it was standardized in the 1990s, but politics — and the inevitable allure of a very convenient sounding (but very bad) idea — meant that the bad idea won.
Funny enough, while Java has deprecated their version of thread cancellation for the same reasons, Haskell still has theirs. When you’re writing code in IO, you have to be prepared for async cancellation anywhere, at any time.
This leads to common bugs in the standard library that you really wouldn’t expect from a language like Haskell; e.g. https://github.com/haskell/process/issues/183 (withCreateProcess async exception safety)
Comment by AndyKelley 2 days ago
Musl has an undocumented extension that does exactly this: PTHREAD_CANCEL_MASKED passed to pthread_setcancelstate.
It's great and it should be standardized.
Comment by frumplestlatz 2 days ago
Looking at some of my shipping code, there's a fair bit that triggers a runtime `assert()` if `pthread_mutex_lock()` fails, as that should never occur outside of a locking bug of my own making.
Comment by gpderetta 2 days ago
Comment by AndyKelley 2 days ago
Musl solves this problem by inspecting the program counter in the interrupt handler and checking if it falls specifically in that range, and if so, modifying registers such that when it returns from the signal, it returns to instructions that cause ECANCELED to be returned.
Blew my mind when I learned this last month.
Comment by Veserv 1 day ago
Comment by cryptonector 2 days ago
Comment by cryptonector 2 days ago
Comment by themafia 2 days ago
The initialization of these objects should have been separate and then used as a parameter to the functions that operate on them. Then you could load the /etc/gai.conf configuration, parse it, then pass that to getaddrinfo(). The fact that multiple cancellation points are discreetly buried in the paths of these functions is an element of unfortunate design.
Comment by kccqzy 2 days ago
Comment by cryptonector 2 days ago
Comment by nextaccountic 8 hours ago
this is not possible if you are calling third party code that you can't modify. in this case it's probably a better idea to run it on another process and use shared memory to communicate back results. this can even be done in an airtight sandboxed manner (browsers do this for example), something that can't really be done with threads
Comment by paulddraper 2 days ago
Comment by marcosdumay 2 days ago
Java has that same issue.
Comment by dweekly 2 days ago
Yes, there is separate work to discern what DNS server the system is currently using: on macOS this requires a call to an undocumented function in libSystem - that both Chromium and Tailscale use!
Comment by AaronFriel 2 days ago
The golang team also thought DNS clients were simple, and it led to almost ten years of difficult to debug panics in Docker, Mesos, Terraform, Mesos, Consul, Heroku, Weave and countless other services and CLI tools written in Go. (Search "cannot unmarshal DNS message" and marvel at the thousands of forum threads and GitHub issues that all bottom out at Go implementing the original DNS spec and not following later updates.)
Comment by formerly_proven 2 days ago
Comment by frumplestlatz 2 days ago
Since you're not using the system resolver, you won't benefit from mDNSResponder's built-in DNS caching and mDNS resolution/caching/service registration, so you're going to need to reimplement all of of that, too. And don't forget about nsswitch on BSD/Linux/Solaris/etc -- there's no generic API that let's you plug into that cleanly, so for a complete implementation there, you need to:
- Reimplement built-in modules like `hosts` (for `/etc/hosts`), `cache` (query a local `nscd` cache, etc), and more.
- Parse the nsswitch.conf configuration file, including the rule syntax for defining whether to continue/return on different status codes.
- Reimplement rule-based dispatch to both the built-in modules and custom, dynamically loaded modules (like `nss_mdns` for mDNS resolution).
Each OS has its own set of built-ins, and private/incompatible interfaces for interacting with things like the `nscd` cache daemon. Plus, the nsswitch APIs and config files themselves differ across operating systems. And we haven't even discussed Windows yet.
Re-implementing all of this correctly, thoroughly, and keeping it working across OS changes is extremely non-trivial.
The simplest and most correct solution is to just:
- Use OS-specific async APIs when available; e.g. `CFHostStartInfoResolution()` on macOS, `DnsQueryEx()` on Windows, `getaddrinfo_a()` on glibc (although that spawns a thread, too), etc.
- If you have a special use-case where you need absolutely need better performance, and do not need to support all the system resolver functionality above (i.e. server-side, controlled deployment environment), use an event-based async resolver library.
- Otherwise, issue a blocking call to `getaddrinfo()` on a new thread. If you're very worried about unbounded resource consumption, use a size-limited thread pool.
Comment by dweekly 2 days ago
CFHostStartInfoResolution is deprecated, no? https://developer.apple.com/documentation/cfnetwork/cfhostst...:)
That leaves us with DNSServiceGetAddrInfo? https://developer.apple.com/documentation/dnssd/dnsservicege...:) or some kinda convoluted use of Network and NWEndpoint/NWconnection with continuations could do the same?
Comment by frumplestlatz 2 days ago
https://developer.apple.com/documentation/technotes/tn3151-c...
Comment by cryptonector 1 day ago
Comment by GoblinSlayer 1 day ago
Comment by btown 2 days ago
Comment by petcat 2 days ago
Comment by btown 2 days ago
Comment by 01HNNWZ0MV43FF 2 days ago
Comment by frumplestlatz 2 days ago
If you don’t use the system resolver, you have to glue into the system’s configuration mechanism for resolvers somehow … which isn’t simple — for example, there’s a lot of complex logic on macOS around handling which resolver to use based on what connections, VPNs, etc, are present.
And the there’s nsswitch and other plugin systems that are meant to allow globally configured hooks plug into the name resolution path.
Comment by AndyKelley 2 days ago
It's really not.
Just because some systems took something fundamentally simple and wrapped a bunch of unnecessary complexity around it does not make it hard.
At its core, it's an elegant, minimal protocol.
Comment by bwblabs 2 days ago
But maybe DNSSEC is the 'unnecessary complexity' for you (I think it's kind of fundamental to secure DNS). Also without DNSSEC they needed RFC's like https://datatracker.ietf.org/doc/html/rfc8020 to clarify fundamentals (same goes for https://datatracker.ietf.org/doc/html/rfc8482 to fix stuff).
Comment by cryptonector 1 day ago
Comment by kccqzy 2 days ago
We have complexity like different kinds of VPNs, from network-level VPNs to app-based VPNs to MDM-managed VPNs possibly coexisting. We have on-demand VPNs that only start when a particular domain is being visited: yes VPN starting because of DNS. We have user-provided or admin-provided hardcoded responses in /etc/hosts. We have user-specified resolver overrides (for example the user wants to use 8.8.8.8 not ISP resolver). We have multiple sources of network-provided resolvers from RDNSS to DHCPv6 O mode.
It is non-trivial to determine which resolver to even start sending datagrams with that elegant minimal protocol.
Comment by tptacek 2 days ago
Comment by citrin_ru 16 hours ago
Comment by leshow 1 day ago
Comment by cryptonector 1 day ago
Comment by marcusb 15 hours ago
https://github.com/hickory-dns/hickory-dns is our Git repo
Documentation for the resolver including an example: https://docs.rs/hickory-resolver/latest/hickory_resolver/ind...
Comment by cryptonector 10 hours ago
Comment by javantanna 2 days ago
Comment by cryptonector 2 days ago
Comment by benatkin 2 days ago
Comment by brcmthrowaway 2 days ago
Comment by AndyKelley 2 days ago
POSIX can specify a new version of DNS resolution.
libcs can add extensions, allowing applications to detect when they are targeting those systems and use them.
Applications on Linux and Windows can bypass libc.
Comment by brcmthrowaway 2 days ago
Comment by AndyKelley 2 days ago
Comment by jupp0r 2 days ago