Skip to content

Commit ff2e650

Browse files
committedNov 15, 2012
Fix regex to remove UTF-8 chars invalid in XML 1.0
The regex introduced in string_html_specialchars() function with commit 2b5d662 caused problems with multibyte UTF-8 chars, as PCRE require that they are specified like '\x{NNNN}'; the syntax without braces '\xNN' only supports up to 2 hex digits [1]. Fixes #14744 [1] http://php.net/regexp.reference.escape
1 parent 7ae2d9a commit ff2e650

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed
 

‎core/string_api.php

+1-1
Original file line numberDiff line numberDiff line change
@@ -912,7 +912,7 @@ function string_html_entities( $p_string ) {
912912
function string_html_specialchars( $p_string ) {
913913
# Remove any invalid character from the string per XML 1.0 specification
914914
# http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char
915-
$p_string = preg_replace( '/[^\x9\xA\xD\x20-\xD7FF\xE000-\xFFFD\x{10000}-\x{10FFFF}]/u', '', $p_string );
915+
$p_string = preg_replace( '/[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u', '', $p_string );
916916

917917
# achumakov: @ added to avoid warning output in unsupported codepages
918918
# e.g. 8859-2, windows-1257, Korean, which are treated as 8859-1.

0 commit comments

Comments
 (0)
Please sign in to comment.