Encoding & and HTML Entities in PHP


We ran into an issue at work with a CMS a while back where characters such as & were being input into the DB correctly but when output back into the CMS for further editing they were sent back to the text area as & and not & as they should have been, because after that the actual character was being sent into the database, not the character code any more!
This issue was also occurring with » (») which was being used extensively, and also ™ (&#8424.)

When I say issue, I mean that this went just shy of out of control with our already tight deadline. I vowed to find a more efficient solution for our future "internal only" CMS's (the ones that we use to then create static pages to ship off to clients.)

My solutions:

  1. Never use a CMS to create static pages to ship out
  2. Write a function to handle HTML entities correctly.

Well I'm still working on #1 and in the meanwhile I figured I ought to work on #2.

There's a lot of steps that need to be addressed before this problem is solved.

  1. Transform characters before they enter the database or after they leave the database (edit vs render)
  2. How do you transform literal characters into something? Can you search and replace?
  3. How do you make sure that everyone is the right character code?

So I decided to start simple. I just want to make sure my &'s are encoded properly. I know, that's kinda simple and boring but it's a good start to get into the swing of things.

Regular Expression: Search and Replace

We'll start with the PHP function:

PHP

<?php
    $text = preg_replace($pattern, $replace, $haystack);
?>

We know what replace is going to be:

PHP

<?php
    $text = preg_replace($pattern, '&amp;', $haystack);
?>

We'll create a simple haystack to search through for this sample:

PHP

<?php
    $haystack = "<p>Amber &amp; and I know that some &#8220;things&#8221; are a problem and it will be a give & take when she moves in&hellip;</p>";
    $text = preg_replace($pattern, '&amp;', $haystack);
?>

We need this haystack to end up looking like this when it's done so that what is sent from the textarea -> DB -> textarea -> DB is consistent and logical.

HTML

<p>Amber &amp;amp; and I know that some &amp;#8220;things&amp;#8221; are a problem and it will be a give &amp;amp; take when she moves in&amp;hellip;</p>

Simple enough?

Continue on to page 2: Make that regular expression to replace all those pesky &'s

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.


61 kittens will go hungry today unless you
subscribe to the Ninedays Blog feed!
* We are no long in affiliation with bonsai kittens.
Similar Entries
MySQL Queries Made Easy With PHP Functions Library
Every web developer has a library of code that they reference frequently, if not constantly. I'm sharing with you one of my most useful snippets my MySQL helper class that helps me organize my queries and easily reference commonly used functions.
Special Characters in Custom CMSs and Forms
Encoding special characters can be tricky, especially when you have clients with custom CMSs who like to copy and paste all the em dashes and trademark symbols from word into your web based text area. Now I have found a solution for my needs!
From Query String to Cookie with JavaScript
Take a query string and save the values to a cookie using JavaScript. A perfect solution for tracking query string's values throughout a site that isn't using any server side scripting.
Search Query for 404 Error Pages Wordpress
A code snippet update to a Wordpress 404 template that immediately redirects to a search if a tag no longer exists. It even re-directs with a 301 code to ensure search engine's follow.
Block Content and Detect Wordpress Preview
Need to block tracking scripts in your WordPress Template when you preview your entries. I have just the way! Just a few lines into your functions.php file and header/footers and you can block Mint or Analytics in WordPress previews.
Next Post
Wachusett Reservoir
Previous Post
Wordpress or Pixelpost - Photoblog time

Write a Comment

Take a moment to comment and tell us what you think. Some basic HTML is allowed for formatting.

Reader Comments

Be the first to leave a comment!