New TextBreakIterator's default behavior breaks after space run #25269
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before this change, we were considering breaking opportunities before
space runs. This approach allowed us to avoid re-shaping in many cases,
which has an important advantage in terms of performance.
However, the Unicode spec (UAX#14) state that breaking before a space
character is not allowed [1], so we had to implement this logic after
our TextBreakIterator had already determined the best breaking
opportunity. This approach has been working fine so far for regular
spaces (white-space, tabs, ...), but it doesn't work correctly for
other BA [2] class characters; in the CSS Text specification, these are
known as "other space separators" [3].
In order to implement the correct behavior for any kind of space, we
would need to change our TextBreakIterator implementation so that
matches the Unicode rules, considering breaking opportunity after
space runs. This change should also consider the performance impact
of the extra re-shaping operations required to deal with trailing
spaces.
In order to prevent performance regressions, we'll store the position
of the 'end of non-hangable run', which will be used in case of items
with styles dictating rules to collapse trailing spaces.
[1] https://unicode-org.atlassian.net/browse/ICU-20843
[2] https://www.unicode.org/reports/tr14/tr14-39.html#BA
[3] https://drafts.csswg.org/css-text-3/#other-space-separators
Change-Id: Ie4a3890c75a3faff1a0155d4a40bcaa85bc6ac06
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2047943
Commit-Queue: Javier Fernandez <jfernandez@igalia.com>
Reviewed-by: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#806928}