Skip to content

Commit

Permalink
Remove invalid chars from displayed string per XML specification
Browse files Browse the repository at this point in the history
Strict XHTML requires that data comply with XML 1.0 specification [1],
which only allows a subset of the UTF-8 charset.

Function string_html_specialchars() has been modified to remove from the
string to print, any character which is not in the defined range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
         [#x10000-#x10FFFF]

Fixes #14744

[1] http://www.w3.org/TR/xml/
  • Loading branch information
dregad committed Sep 26, 2012
1 parent a93121b commit 2b5d662
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions core/string_api.php
Expand Up @@ -910,6 +910,10 @@ function string_html_entities( $p_string ) {
* @return string
*/
function string_html_specialchars( $p_string ) {
# Remove any invalid character from the string per XML 1.0 specification
# http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
$p_string = preg_replace( '/[^\x9\xA\xD\x20-\xD7FF\xE000-\xFFFD\x{10000}-\x{10FFFF}]/u', '', $p_string );

# achumakov: @ added to avoid warning output in unsupported codepages
# e.g. 8859-2, windows-1257, Korean, which are treated as 8859-1.
# This is VERY important for Eastern European, Baltic and Korean languages
Expand Down

0 comments on commit 2b5d662

Please sign in to comment.