Encoding & and HTML Entities in PHP
We ran into an issue at work with a CMS a while back where characters such as & were being input into the DB correctly but when output back into the CMS for further editing they were sent back to the text area as & and not & as they should have been, because after that the actual character was being sent into the database, not the character code any more!
This issue was also occurring with » (») which was being used extensively, and also ™ (⃨.)
When I say issue, I mean that this went just shy of out of control with our already tight deadline. I vowed to find a more efficient solution for our future "internal only" CMS's (the ones that we use to then create static pages to ship off to clients.)
My solutions:
- Never use a CMS to create static pages to ship out
- Write a function to handle HTML entities correctly.
Well I'm still working on #1 and in the meanwhile I figured I ought to work on #2.
There's a lot of steps that need to be addressed before this problem is solved.
- Transform characters before they enter the database or after they leave the database (edit vs render)
- How do you transform literal characters into something? Can you search and replace?
- How do you make sure that everyone is the right character code?
So I decided to start simple. I just want to make sure my &'s are encoded properly. I know, that's kinda simple and boring but it's a good start to get into the swing of things.
Regular Expression: Search and Replace
We'll start with the PHP function:
PHP
<?php
$text = preg_replace($pattern, $replace, $haystack);
?>
We know what replace is going to be:
PHP
<?php
$text = preg_replace($pattern, '&', $haystack);
?>
We'll create a simple haystack to search through for this sample:
PHP
<?php
$haystack = "<p>Amber & and I know that some “things” are a problem and it will be a give & take when she moves in…</p>";
$text = preg_replace($pattern, '&', $haystack);
?>
We need this haystack to end up looking like this when it's done so that what is sent from the textarea -> DB -> textarea -> DB is consistent and logical.
HTML
<p>Amber &amp; and I know that some &#8220;things&#8221; are a problem and it will be a give &amp; take when she moves in&hellip;</p>
Simple enough?
Continue on to page 2: Make that regular expression to replace all those pesky &'s
the newest discoveries, stories and shared tips!Come on, all the cool kids are doing it ;)


