It’s been a while since I wrote about Encoding & and HTML Entities in PHP and that was just one part of solving the problem of special characters (such as ™, ©, ») and saving/retrieving them successfully from a database/cms interface.
So now, 3 months and a week later I bring you the solution to those tricky copy and paste form word characters for your custom forms and CMS’s!
I call it: Nothing more than a well thought out, tested and executed function, complete with required content-type.
Sounds less fancy when I call it that but that’s all there is to it!
Again to describe the problem we had:
- We’d enter the HTML from our text editor where our « looked like « and past that into our simple text area/text field
- The special character would be stored into the database and would display on the web page as well as the CMS editor (when the page was open to edit)
- We’d pull the information from the database to the simple HTML text area/text field where the special characters would display as the special character, not the HTML code.
- When the data was saved back into the database it would save as the special character, not the intended HTML character code. 5 Now, depending on the character type of the document or sometimes the browser, this symbol would not display and often be replaced by a box or question mark.
We also needed to account for cases where the first two steps were omitted because of the dreaded “copy and paste from word” where characters would be entered into the text area directly a special characters.
Updates to the Form Add/Edit Page
So here’s how to fix that problem, on the pages that display your forms (be they add or edit) you need to make sure your content-type charset encoding is utf-8.
HTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Update to The Server Side – Saving to The Database Page
Add the following function to your included custom functions file. (The bottom of the function should look familiar form the old Encoding & and HTML Entities in PHP article.
PHP
function convertSpecialCharacters($text){
$badwordchars=array(
"\xe2\x80\x98", // left single quote
"\xe2\x80\x99", // right single quote
"\xe2\x80\x9c", // left double quote
"\xe2\x80\x9d", // right double quote
"\xe2\x80\x94", // em dash
"\xe2\x80\x93", // en dash
"\xc2\xbb", // right arrow quote
"\xc2\xab", // left arrow quote
"\xc2\xa9", // copyright
"\xc2\xae", // registered
"\xe2\x84\xa2", // trademark
"\xe2\x82\xac", // euro
"\xe2\x80\xa2", // bullet
"\xe2\x80\xa6" // elipses
);
$fixedwordchars=array(
"‘",
"’",
'“',
'”',
'—',
'–',
'»',
'«',
'©',
'®',
'™',
'€',
'•',
'…'
);
$text = str_replace($badwordchars,$fixedwordchars,$text);
$text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/', '&', $text);
return $text;
}
And then of course: call the function and pass it the variable or text that needs to be encoded.
PHP
convertSpecialCharacters($_POST['title']);
It really is a very easy solution to a problem that appeared so complicated the first time we encountered it at work.
the newest discoveries, stories and shared tips!Come on, all the cool kids are doing it ;)
{ 1 comment… read it below or add one }
Great stuff, your replacement function seems more complete than others I have found on the Internet. Thanks!
But: - please try to optimise this page for SE (the phrase “microsoft smart quotes” for instance would help to find this page) - why the preg_replace? What does it do?
Thanks again!