• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Displaying non latin characters in a web page

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Displaying non latin characters in a web page

    Just had a "conversation" with my team leader about this. He's convinced he's right & I'm pretty sure I'm right but I'm just the contractor so nobody listens to me.

    If say we have a web page with some Chinese characters in it (for arguments sake), it would not be displayed properly at the client unless the client machine had the right code page / language packs installed because UTF-8 is just a character encoding that needs to map to the right code points defined in the code page on the client (my argument)

    OR

    the Chinese characters are returned verbatim to the client so the set up of the machine is entirely irrelevant & will be displayed regardless (his argument !!!)

    Who's right? If it's him I might as well jack it all in now because this is so basic.

    #2
    Surely it goes without saying that the client must have the font glyph available representing a given character code before it can display a visual representation of that character within a web page. I'd therefore suggest that you're correct and he's wrong.

    Your team leader could however be suggesting that the characters should be sent to the client as GIF/JPG/PNG images - which would be a pretty stupid thing to do.

    Comment


      #3
      Originally posted by chicane View Post
      Surely it goes without saying that the client must have the font glyph available representing a given character code before it can display a visual representation of that character within a web page. I'd therefore suggest that you're correct and he's wrong.
      Unbelievable isn't it? Now i'm going to have to put a test app together to prove my point to all concerned.

      Comment


        #4
        If you're displaying Chinese characters, is it not a reasonable expectation that your intended audience will already have Chinese character sets installed?

        It might look a bit odd to a UK user, but your Chinese users will be laughing...
        ‎"See, you think I give a tulip. Wrong. In fact, while you talk, I'm thinking; How can I give less of a tulip? That's why I look interested."

        Comment


          #5
          UTF-8 is the "codepage" (way of encoding/decoding text data), however unless you have fonts that can display actual Chinese characters you may not have visual representations of those characters correctly mapped. Even if you use some native codepage to Chinese language, then lack of correct fonts will prevent them from being displayed correctly.

          Comment


            #6
            Can UTF-8 even be usefully used for Chinese? You only get 100 - odd (128 perhaps) characters that are mapped, and that may not be enough.

            If you run charmap on Windows, you can see what's in the standard fonts. Arial and the like have Arabic characters, Greek characters etc., but nothing that looks Chinese. So if you use UTF16 you should be able to mix and match all of those without the end user requiring anything. How you use UTF-16, and how creaky old HTML handles code pages I've no idea.

            So perhaps he's not entirely wrong, but he's wrong about Chinese.

            Alternatively, use Flash. Or a picture of the Chinese text.
            Will work inside IR35. Or for food.

            Comment


              #7
              Not necessarily.

              Take this phrase from the BBC News front page: 'Gunmen fire on a bus carrying Togo's national football team to the Africa Cup of Nations in Angola, injuring several players. '

              go to any PC/Mac etc. Paste it into Google translations (http://translate.google.co.uk)

              Select translate English to Chinese, observe the results. Copy those results and translate them into Russian. Then try Russian into Arabic next to Japanese.

              Keep going 'till you convince yourself that none of these characters are dependent on anything you have on your PC.

              PZZ
              Last edited by pzz76077; 8 January 2010, 19:20.

              Comment


                #8
                Originally posted by VectraMan View Post
                Can UTF-8 even be usefully used for Chinese? You only get 100 - odd (128 perhaps) characters that are mapped, and that may not be enough.
                Umm... completely incorrect

                UTF-8 is capable of representing any Unicode character - well, "code point" to be precise. The "8" merely refers to the fact that it uses 8-bit blocks to represent a code point, where a code point may require between 1 and 4 blocks.

                UTF-16 can also represent any Unicode code point, and uses 16-bit blocks as its basic unit, requiring either one or two such blocks.

                As a rule, anything going over the Net should probably use UTF-8, as this is the most compact representation for anything containing characters in the range of code points from 0x00 to 0x7f. Such characters often occur even in pages predominantly written in languages such as Chinese. However if the content is almost exclusively Chinese (strictly speaking, Han) then it may make sense to use UTF-16. However, UTF-8 can still represent any legal character, and probably has the widest level of support of any encoding.

                Example: representations of Chinese character ⽔, KANGXI RADICAL WATER, Unicode code point U+2F54:

                Code:
                 UTF-8: 0xe2 0xbd 0x94
                UTF-16: 0x2f54
                UTF-32: 0x00002f54
                wurzel is correct and his team leader is an idiot. Given that an encoding transforms code points to bytes, it stands to reason that those bytes must be then decoded, according to the specified encoding, to form a code point, that code point must then be matched to a corresponding glyph in one of the fonts on the system, and that glyph defines the visual representation of the character.

                Comment


                  #9
                  Originally posted by NickFitz View Post
                  Example: representations of Chinese character ⽔, KANGXI RADICAL WATER, Unicode code point U+2F54:
                  In case anyone is interested, when I try to view that post (and using this test page) I get:

                  Windows ME, IE6... a square box
                  Windows ME, Firefox v2... a question mark

                  Windows XP, IE v8... a square box
                  Windows XP, Firefox v3... a pretty little box with 2F at the top and 54 at the bottom.


                  Edit: that test page also says: "You need a font that supports this character to even have a hope of seeing it correctly in the browser."
                  Last edited by RichardCranium; 8 January 2010, 19:35.
                  My all-time favourite Dilbert cartoon, this is: BTW, a Dumpster is a brand of skip, I think.

                  Comment


                    #10
                    Originally posted by RichardCranium View Post
                    In case anyone is interested, when I try to view that post (and using this test page) I get:

                    Windows ME, IE6... a square box
                    Windows ME, Firefox v2... a question mark

                    Windows XP, IE v8... a square box
                    Windows XP, Firefox v3... a pretty little box with 2F at the top and 54 at the bottom.


                    Edit: that test page also says: "You need a font that supports this character to even have a hope of seeing it correctly in the browser."
                    I'm assuming neither of those machines has MS Word installed (or at least not a 200x version), and therefore don't have the Arial Unicode MS font? That contains (in its latest versions) 51,180 glyphs covering 38,911 characters. I believe it can handle any writing system one is likely to find in the wild.

                    Comment

                    Working...
                    X