Skip to content

Commit 2b5d662

Browse files
committedSep 26, 2012
Remove invalid chars from displayed string per XML specification
Strict XHTML requires that data comply with XML 1.0 specification [1], which only allows a subset of the UTF-8 charset. Function string_html_specialchars() has been modified to remove from the string to print, any character which is not in the defined range Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Fixes #14744 [1] http://www.w3.org/TR/xml/
1 parent a93121b commit 2b5d662

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed
 

‎core/string_api.php

+4
Original file line numberDiff line numberDiff line change
@@ -910,6 +910,10 @@ function string_html_entities( $p_string ) {
910910
* @return string
911911
*/
912912
function string_html_specialchars( $p_string ) {
913+
# Remove any invalid character from the string per XML 1.0 specification
914+
# http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
915+
$p_string = preg_replace( '/[^\x9\xA\xD\x20-\xD7FF\xE000-\xFFFD\x{10000}-\x{10FFFF}]/u', '', $p_string );
916+
913917
# achumakov: @ added to avoid warning output in unsupported codepages
914918
# e.g. 8859-2, windows-1257, Korean, which are treated as 8859-1.
915919
# This is VERY important for Eastern European, Baltic and Korean languages

0 commit comments

Comments
 (0)
Please sign in to comment.