Special Characters in Custom CMSs and Forms


It's been a while since I wrote about Encoding & and HTML Entities in PHP and that was just one part of solving the problem of special characters (such as ™, ©, ») and saving/retrieving them successfully from a database/cms interface.

So now, 3 months and a week later I bring you the solution to those tricky copy and paste form word characters for your custom forms and CMS's!

I call it: Nothing more than a well thought out, tested and executed function, complete with required content-type.

Sounds less fancy when I call it that but that's all there is to it!

Again to describe the problem we had:

  1. We'd enter the HTML from our text editor where our « looked like « and past that into our simple text area/text field
  2. The special character would be stored into the database and would display on the web page as well as the CMS editor (when the page was open to edit)
  3. We'd pull the information from the database to the simple HTML text area/text field where the special characters would display as the special character, not the HTML code.
  4. When the data was saved back into the database it would save as the special character, not the intended HTML character code. 5 Now, depending on the character type of the document or sometimes the browser, this symbol would not display and often be replaced by a box or question mark.

We also needed to account for cases where the first two steps were omitted because of the dreaded "copy and paste from word" where characters would be entered into the text area directly a special characters.

Updates to the Form Add/Edit Page

So here's how to fix that problem, on the pages that display your forms (be they add or edit) you need to make sure your content-type charset encoding is utf-8.

HTML

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Update to The Server Side - Saving to The Database Page

Add the following function to your included custom functions file. (The bottom of the function should look familiar form the old Encoding & and HTML Entities in PHP article.

PHP

    function convertSpecialCharacters($text){

       $badwordchars=array(
           "\xe2\x80\x98", // left single quote
           "\xe2\x80\x99", // right single quote
           "\xe2\x80\x9c", // left double quote
           "\xe2\x80\x9d", // right double quote
           "\xe2\x80\x94", // em dash
           "\xe2\x80\x93", // en dash
           "\xc2\xbb",     // right arrow quote
           "\xc2\xab",     // left arrow quote
           "\xc2\xa9",     // copyright
           "\xc2\xae",     // registered
           "\xe2\x84\xa2", // trademark
           "\xe2\x82\xac", // euro  
           "\xe2\x80\xa2", // bullet    
           "\xe2\x80\xa6"  // elipses   
       );
       $fixedwordchars=array(
           "&#8216;",
           "&#8217;",
           '&#8220;',
           '&#8221;',
           '&mdash;',
           '&ndash;',
           '&raquo;',
           '&laquo;',
           '&copy;',
           '&reg;', 
           '&trade;',
           '&euro;',
           '&bull;',
           '&#8230;'
       );
       $text = str_replace($badwordchars,$fixedwordchars,$text);
       $text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/', '&amp;', $text);
       return $text;
    }

And then of course: call the function and pass it the variable or text that needs to be encoded.

PHP

    convertSpecialCharacters($_POST['title']);

It really is a very easy solution to a problem that appeared so complicated the first time we encountered it at work.

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


55 kittens will go hungry today unless you
subscribe to the Ninedays Blog feed!
* We are no long in affiliation with bonsai kittens.
Similar Entries
Encoding & and HTML Entities in PHP
Special characters like MS Word's smart quotes can muck up a good custom piece to CMS work. First we're going to go over using a regular expression to make sure that & and any character code starting with & can always be save, and displayed in a form or webpage.
Smart Quotes in Code with WordPress and Markdown
Wordpress' built in smart quote formatting was turning my code into un-copy and pastable stuff. Not good! I found a way to turn it off but I"m looking for a way to make it smart enough to avoid all my code.
PHP State Drop Down Menu - Reusable Code!
Drop down or select menu's are a common form item that is used frequently in the CMS', registration forms and lead forms I build. In this entry I've explored the way I find to most effectively display frequently used drop downs: US states, months and days of the week.
Limit Characters in a Textarea with Prototype
Limit and display the number of characters available in a text area using my little add on script to the Prototype JavaScript library. Very easy to implement, and great feedback to your form filler-outers.
Zip Code, Phone Number and Replacing Empty Fields with Really Easy Validation
While using the Really Easy Validation JavaScript validation library I've accumulated a few custom validations that for US phone numbers, US zip codes, Canadian postal codes as well as replacing empty fields or fields that don't contain a numeric value. Good stuff!
Next Post
Better Budget Challenge No. 1 Response
Previous Post
Limit Characters in a Textarea with Prototype

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Be the first to leave a comment!