It is incorrect to "normalize" // in HTTP URL paths
Posted by pabs3 4 hours ago
Comments
Comment by echoangle 58 minutes ago
> nginx with merge_slashes
How can it be wrong if it is server-side? If the server wants to treat those paths equally, it can if it wants to.
It would only be wrong if a client does it and requests a different URL than the user entered, right?
Comment by leni536 23 minutes ago
It matters where the normalization happens, and server-side behavior is out-of-scope of these identifier RFCs.
Comment by OoooooooO 23 minutes ago
> Therefore, collapsing // to / in HTTP URL path segments is not correct normalization. It produces a different, non-equivalent identifier unless the origin explicitly defines those two paths as equivalent.
Comment by MattJ100 2 hours ago
It gets worse if you are mapping URLs to a filesystem (e.g. for serving files). Even though they look similar, URL paths have different capabilities and rules than filesystems, and different filesystems also vary. This is also an example of that (I don't think most filesystems support empty directory names).
Comment by dale_glass 1 hour ago
Because maybe you use S3, which treats `foo/bar.txt` and `foo//bar.txt` as entirely separate things. Because to S3, directories don't exist and those are literally the exact names of the keys under which data is stored.
So you have script A concatenate "foo" + "/bar" and script B concatenate "foo/" + "/bar", and suddenly you have a weird problem.
I can't imagine a real use case where you'd think this is desirable.
Comment by Mordisquitos 16 minutes ago
Not S3, but here's a literal real use case: the entry for the Iraqw word /ameeni (woman) in Wiktionary.
https://en.wiktionary.org/wiki//ameeni
If for whatever reason your S3 keys contained English words and their translations separated by a slash, you would have a real problem if one of your scripts were to concatenate woman, / and /ameeni as woman/ameeni instead of woman//ameeni in the English/Iraqw case.
Comment by secondcoming 1 hour ago
Comment by PunchyHamster 1 hour ago
Nothing on web is "correct", deal with it
Comment by leni536 33 minutes ago
Of course you shouldn't assume that in a client. If you are implementing against an API don't deviate regarding // and trailing / from the API documentation.
Comment by sfeng 1 hour ago
Comment by renewiltord 1 hour ago
Comment by mjs01 2 hours ago
Comment by PunchyHamster 1 hour ago
Comment by janmarsal 1 hour ago
Comment by leni536 1 hour ago
Comment by stanac 54 minutes ago
Comment by WesolyKubeczek 2 hours ago
Not doing it is like punishing people for not using Oxford commas, or entering an hour long debate each time someone writes “would of” instead of “would have”. It grinds my gears too, but I have different hills to die on.
Comment by bazoom42 1 hour ago
Comment by PunchyHamster 1 hour ago
Comment by jeroenhd 1 hour ago
Plenty of websites rewrite paths like /a/b/c/d into a backend service call like /?w=a&x=b&y=c&z=d. In that scheme, /a//c/d would rewrite to /?w=a&x=&y=c&z=d, something entirely distinct from /a/c/d working out to /?w=a&x=b&y=c
It's not the application's fault that the people attempting to configure web server URLs don't know how web server URLs work.
Comment by bazoom42 1 hour ago
Comment by Etheryte 2 hours ago
Comment by j16sdiz 2 hours ago
Comment by jeroenhd 1 hour ago
Not that you can include custom normalization rules (like collapsing slashes, tolower()ing the entire path, removing the query part of the URL), but that's not part of the standard. If you're doing anything extra, the risk of breaking stuff is on you.
Comment by LeonTing8090 1 hour ago