• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Firefox 'funny' characters ?"

Collapse

  • NickFitz
    replied
    Originally posted by Platypus View Post
    I think that ' does show ok in HTML, but ‘ and ’ (the curly varieties) do not.
    It's not HTML, it's a separate issue. Browsers will display curly quotes in HTML perfectly well (whether as entities or just the raw characters like “”) - in fact, they can happily manage things like umbrellas ☂ and sunshine ☀ if you have a suitable font installed. It's just down to the fact that the ‘’ in the XML feed I'm grabbing are encoded as 0x91 and 0x92 respectively, which is the ISO-8859-1 encoding, but are being parsed into UTF-8, which converts (e.g. ‘) to the multibyte representation 0xc2 0x91, which is what gets stored in the database. Then, when it's spat out by the forum software, the browser is being told that it's receiving ISO-8859-1 - and in that character encoding, 0xc2 is Â, so you see that character followed by the left single curly quote you were supposed to be getting all along.

    Leave a comment:


  • Platypus
    replied
    Originally posted by xoggoth View Post
    Before 8 IE was very forgiving of all sorts of things that were not in the "standards" (actually the way browsers should be in my opinion unless there's some important securtity consideration). You could even get away with .Width instead of .width in jscript. All the browsers can seem inconsistent, why does " in HTML show ok but ' doesn't?
    I think that ' does show ok in HTML, but ‘ and ’ (the curly varieties) do not.

    Leave a comment:


  • xoggoth
    replied
    But I've been seeing this for years on FF.
    And I just had a quick peek using IE8 - same thing!
    Before 8 IE was very forgiving of all sorts of things that were not in the "standards" (actually the way browsers should be in my opinion unless there's some important securtity consideration). You could even get away with .Width instead of .width in jscript. All the browsers can seem inconsistent, why does " in HTML show ok but ' doesn't?

    Leave a comment:


  • Sysman
    replied
    Originally posted by Platypus View Post


    That's more than I could stomach
    I've long been aware of what might happen to MySQL under Oracle's ownership, but to see it in OOo is still a shock.

    I wish Apple would give their Numbers spreadsheet a serious boost. It's fine for the occasional user, but really doesn't cut the mustard for serious business style number crunching.

    Leave a comment:


  • Sysman
    replied
    Originally posted by OwlHoot View Post
    On many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)
    Yep, saw some weirdness the other day on the Beeb's iPlayer "Play" page.

    Leave a comment:


  • Platypus
    replied
    Originally posted by Sysman View Post
    * I still haven't get used to seeing Oracle on the startup splash screen.


    That's more than I could stomach

    Leave a comment:


  • Sysman
    replied
    Originally posted by bogeyman View Post
    The funny accented A's are just fancy curly opening/closing single or double quotes in this case.

    It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.
    A copy and paste from an OpenOffice document (yes, even a spreadsheet!) will do that. OpenOffice* will silently convert quotes and dashes to the fancy typographical versions by default. That might be OK in a word processing document but it's bloody criminal in a spreadsheet whose contents may be heading for a database.

    I wouldn't be surprised if Word does the same, but I don't think Excel does.


    * I still haven't get used to seeing Oracle on the startup splash screen.

    Leave a comment:


  • NickFitz
    replied
    Originally posted by bogeyman View Post
    Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?
    Unfortunately, it's too late by the time it gets to the point where it makes sense to use HTML entities. The way it's set up at the moment is that the news is entered into the main site CMS, which saves a copy of the headlines as an XML file on the forum server (as well as shoving the stories into the main site database, of course). My vBulletin plugin checks that file's last modification date as and when, and if it's been updated it parses the XML and shoves the headlines into the forum database, ready to be displayed in the sidebar.

    It's only at display time that it makes sense to replace oddball characters with entities, and by then it's too late, as the characters got screwed up either when the file was created, when it was parsed, or when the forum database was updated - my current best guess is the parsing, but I need to confirm that.

    The good news is that the main site CMS is soon to be upgraded to a system that's UTF-8 from end to end, so that should make it easier to sort things out.

    Leave a comment:


  • OwlHoot
    replied
    On many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)

    Leave a comment:


  • bogeyman
    replied
    Originally posted by NickFitz View Post
    The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

    I'll see about getting it fixed
    Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?

    Leave a comment:


  • NickFitz
    replied
    The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

    I'll see about getting it fixed

    Leave a comment:


  • bogeyman
    replied
    Originally posted by Platypus View Post
    I'm on Win XP SP3, native (not VM) running FF 3.6.12
    But I've been seeing this for years on FF.
    And I just had a quick peek using IE8 - same thing!

    I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured
    What it basically comes down to is that the text content has characters that are not part of the common character set.

    The funny accented A's are just fancy curly opening/closing single or double quotes in this case.

    It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.

    That doesn't seen to be happing for some reason.

    It's not a fault with your browser or anything.

    Leave a comment:


  • Platypus
    replied
    Originally posted by bogeyman View Post
    Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

    The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” $lsquo; etc.).
    ... so does this that the webpage is in error?

    EDIT: and furthermore, if it is, why don't the people who create such pages immediately see the error?


    This very page is indeed ISO-8859-1
    Last edited by Platypus; 22 November 2010, 20:42.

    Leave a comment:


  • Platypus
    replied
    Originally posted by bogeyman View Post
    You on a Mac Platypus?

    I see the same thing on FF and Chrome (OS X 10.6.4).

    See the same thing in FF on Win XP under VMWare too.
    I'm on Win XP SP3, native (not VM) running FF 3.6.12
    But I've been seeing this for years on FF.
    And I just had a quick peek using IE8 - same thing!

    I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured

    Leave a comment:


  • bogeyman
    replied
    You on a Mac Platypus?

    I see the same thing on FF and Chrome (OS X 10.6.4).

    I see the same thing in FF on Win XP under VMWare too.

    Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

    The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” ‘ etc.).
    Last edited by bogeyman; 22 November 2010, 20:36.

    Leave a comment:

Working...
X