Various locale mismatch scenarios in Windows clipboard text format synthesis

Posted by ibobev 5 days ago

Comments

Comment by jey 7 hours ago

> When I ran this program, I expected the `CF_OEMTEXT` string to have the byte 44, but it didn’t. It had the byte 90. We will start unraveling this mystery next time.

Whoa there exists something Raymond Chen didn’t know about Windows core APIs?

Comment by akersten 10 hours ago

i don't know what it would take to remove all this OEM LCID 1252 ANSI nonsense from computing (well, just Windows) but if I were in charge of "make sure developers ever willingly choose to work on Win32 instead of any other sane Unicode only platform" I would make it my top priority

whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post

Comment by magicalhippo 5 hours ago

> whatever imagined problem is solved by marking clipboard text with some magical locale indicator is surely not as important as being able to interop literally just unicode characters between programs without having to read a 2-part blog post

Unicode-enabled Win32 applications can already do this as described in the article, the program pasting to the clipboard adds CF_UNICODETEXT format, and the program reading from the clipboard checks if CF_UNICODETEXT is available and prefers it over CF_TEXT.

The CF_LOCALE is used by the system to convert[1] CF_TEXT to CF_UNICODETEXT, so a Unicode-enabled application can get the right contents from a non-Unicode-enabled application.

[1]: https://learn.microsoft.com/en-us/windows/win32/dataxchg/sta...

Comment by ElectricalUnion 5 hours ago

If both programs do support unicode, they should just work. This entire post exists because legacy programs do not. And you are using Win32 because of those legacy programs.

That is also why Win32 seems to be the most stable API for userland programs, while constant recompiles of the entire userland are very much the norm and required so your desktop and apps can keep working on other *NIX.

Comment by jack1243star 7 hours ago

> marking clipboard text with some magical locale indicator

The geniuses behind Unicode managed to make it mandatory anyways, at least if you want correct CJK text rendering :)

Comment by ElectricalUnion 5 hours ago

I know that before, Unicode and locale aware systems were supposed to use unicode tags (U+E0000..U+E007F) to invisibly and "for all plaintext purposes" mark text for such han unification handling but that use is now deprecated.

What I am supposed to use those days? HTML-encoded in utf-8, with lang attributes, so <span lang="ja-JA"> and <bdi lang="zh-Hans"> infested text?

Comment by charcircuit 9 hours ago

They already did with C#.

  string text = Clipboard.GetText();