I hate RTF, but it's one of those things that won't go away.
I'm trying to copy and paste chinese characters from charmap on Windows 7, into my app using RTF. Some of them work, some don't. Using the Microsoft YaHei font, this is what I get for one that works:
I've underlined the important bit. It's the \u tag that sends any unicode character as a decimal value, and 15431 is indeed the correct code and I get the correct character out.
If I try a character a bit higher up, in this case 0x7100, this is what I get:
This time rather than send me a \u it's sending \'9f\'57. A \' sends a two digit hex value to cover the range 128-255, which you should then translate according to the code page. But that's two characters not the one I was expecting, and the code page is 1252 which is normal ANSI, and the language is 2057 which is latin.
I don't understand how I'm meant to get from two characters 0x9f and 0x57 to 0x7100, and it's not UTF8 (which would be 3 characters and I don't think RTF uses UTF8 anyway). The only other thing is the charset on the font ( 134 = chinese ), but I'm not sure how I get from that to a code page, and it would still give me two characters out not that one I'm expecting.
Does anybody understand all this? Does anybody speak chinese?
I'm trying to copy and paste chinese characters from charmap on Windows 7, into my app using RTF. Some of them work, some don't. Using the Microsoft YaHei font, this is what I get for one that works:
Code:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset134 Microsoft YaHei;} {\f1\fnil\fcharset0 MS Shell Dlg 2;}} {\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\f0\fs20\u15431?\f1\fs17\par }
If I try a character a bit higher up, in this case 0x7100, this is what I get:
Code:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset134 Microsoft YaHei;}{\f1\fnil\fcharset0 MS Shell Dlg 2;}} {\*\generator Msftedit 5.41.21.2509;}\viewkind4\uc1\pard\f0\fs20\'9f\'57\f1\fs17\par }
I don't understand how I'm meant to get from two characters 0x9f and 0x57 to 0x7100, and it's not UTF8 (which would be 3 characters and I don't think RTF uses UTF8 anyway). The only other thing is the charset on the font ( 134 = chinese ), but I'm not sure how I get from that to a code page, and it would still give me two characters out not that one I'm expecting.
Does anybody understand all this? Does anybody speak chinese?
Comment