Originally posted by Platypus
View Post
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Firefox 'funny' characters ?
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Firefox 'funny' characters ?"
Collapse
-
It's not HTML, it's a separate issue. Browsers will display curly quotes in HTML perfectly well (whether as entities or just the raw characters like “”) - in fact, they can happily manage things like umbrellas ☂ and sunshine ☀ if you have a suitable font installed. It's just down to the fact that the ‘’ in the XML feed I'm grabbing are encoded as 0x91 and 0x92 respectively, which is the ISO-8859-1 encoding, but are being parsed into UTF-8, which converts (e.g. ‘) to the multibyte representation 0xc2 0x91, which is what gets stored in the database. Then, when it's spat out by the forum software, the browser is being told that it's receiving ISO-8859-1 - and in that character encoding, 0xc2 is Â, so you see that character followed by the left single curly quote you were supposed to be getting all along.
-
I think that ' does show ok in HTML, but ‘ and ’ (the curly varieties) do not.Originally posted by xoggoth View PostBefore 8 IE was very forgiving of all sorts of things that were not in the "standards" (actually the way browsers should be in my opinion unless there's some important securtity consideration). You could even get away with .Width instead of .width in jscript. All the browsers can seem inconsistent, why does " in HTML show ok but ' doesn't?
Leave a comment:
-
Before 8 IE was very forgiving of all sorts of things that were not in the "standards" (actually the way browsers should be in my opinion unless there's some important securtity consideration). You could even get away with .Width instead of .width in jscript. All the browsers can seem inconsistent, why does " in HTML show ok but ' doesn't?But I've been seeing this for years on FF.
And I just had a quick peek using IE8 - same thing!
Leave a comment:
-
I've long been aware of what might happen to MySQL under Oracle's ownership, but to see it in OOo is still a shock.Originally posted by Platypus View Post
That's more than I could stomach
I wish Apple would give their Numbers spreadsheet a serious boost. It's fine for the occasional user, but really doesn't cut the mustard for serious business style number crunching.
Leave a comment:
-
Yep, saw some weirdness the other day on the Beeb's iPlayer "Play" page.Originally posted by OwlHoot View PostOn many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)
Leave a comment:
-
A copy and paste from an OpenOffice document (yes, even a spreadsheet!) will do that. OpenOffice* will silently convert quotes and dashes to the fancy typographical versions by default. That might be OK in a word processing document but it's bloody criminal in a spreadsheet whose contents may be heading for a database.Originally posted by bogeyman View PostThe funny accented A's are just fancy curly opening/closing single or double quotes in this case.
It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.
I wouldn't be surprised if Word does the same, but I don't think Excel does.
* I still haven't get used to seeing Oracle on the startup splash screen.
Leave a comment:
-
Unfortunately, it's too late by the time it gets to the point where it makes sense to use HTML entities. The way it's set up at the moment is that the news is entered into the main site CMS, which saves a copy of the headlines as an XML file on the forum server (as well as shoving the stories into the main site database, of course). My vBulletin plugin checks that file's last modification date as and when, and if it's been updated it parses the XML and shoves the headlines into the forum database, ready to be displayed in the sidebar.Originally posted by bogeyman View PostGood on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?
It's only at display time that it makes sense to replace oddball characters with entities, and by then it's too late, as the characters got screwed up either when the file was created, when it was parsed, or when the forum database was updated - my current best guess is the parsing, but I need to confirm that.
The good news is that the main site CMS is soon to be upgraded to a system that's UTF-8 from end to end, so that should make it easier to sort things out.
Leave a comment:
-
On many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)
Leave a comment:
-
Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?Originally posted by NickFitz View PostThe headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1
I'll see about getting it fixed
Leave a comment:
-
The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1
I'll see about getting it fixed
Leave a comment:
-
What it basically comes down to is that the text content has characters that are not part of the common character set.Originally posted by Platypus View PostI'm on Win XP SP3, native (not VM) running FF 3.6.12
But I've been seeing this for years on FF.
And I just had a quick peek using IE8 - same thing!
I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured
The funny accented A's are just fancy curly opening/closing single or double quotes in this case.
It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.
That doesn't seen to be happing for some reason.
It's not a fault with your browser or anything.
Leave a comment:
-
... so does this that the webpage is in error?Originally posted by bogeyman View PostCould be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).
The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” $lsquo; etc.).
EDIT: and furthermore, if it is, why don't the people who create such pages immediately see the error?
This very page is indeed ISO-8859-1Last edited by Platypus; 22 November 2010, 20:42.
Leave a comment:
-
I'm on Win XP SP3, native (not VM) running FF 3.6.12Originally posted by bogeyman View PostYou on a Mac Platypus?
I see the same thing on FF and Chrome (OS X 10.6.4).
See the same thing in FF on Win XP under VMWare too.
But I've been seeing this for years on FF.
And I just had a quick peek using IE8 - same thing!
I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured
Leave a comment:
-
You on a Mac Platypus?
I see the same thing on FF and Chrome (OS X 10.6.4).
I see the same thing in FF on Win XP under VMWare too.
Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).
The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” ‘ etc.).Last edited by bogeyman; 22 November 2010, 20:36.
Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers

Leave a comment: