Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New TextBreakIterator's default behavior breaks after space run #25269

Merged
merged 1 commit into from Sep 15, 2020

Conversation

chromium-wpt-export-bot
Copy link
Collaborator

@chromium-wpt-export-bot chromium-wpt-export-bot commented Aug 28, 2020

Before this change, we were considering breaking opportunities before
space runs. This approach allowed us to avoid re-shaping in many cases,
which has an important advantage in terms of performance.

However, the Unicode spec (UAX#14) state that breaking before a space
character is not allowed [1], so we had to implement this logic after
our TextBreakIterator had already determined the best breaking
opportunity. This approach has been working fine so far for regular
spaces (white-space, tabs, ...), but it doesn't work correctly for
other BA [2] class characters; in the CSS Text specification, these are
known as "other space separators" [3].

In order to implement the correct behavior for any kind of space, we
would need to change our TextBreakIterator implementation so that
matches the Unicode rules, considering breaking opportunity after
space runs. This change should also consider the performance impact
of the extra re-shaping operations required to deal with trailing
spaces.

In order to prevent performance regressions, we'll store the position
of the 'end of non-hangable run', which will be used in case of items
with styles dictating rules to collapse trailing spaces.

[1] https://unicode-org.atlassian.net/browse/ICU-20843
[2] https://www.unicode.org/reports/tr14/tr14-39.html#BA
[3] https://drafts.csswg.org/css-text-3/#other-space-separators

Change-Id: Ie4a3890c75a3faff1a0155d4a40bcaa85bc6ac06
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2047943
Commit-Queue: Javier Fernandez <jfernandez@igalia.com>
Reviewed-by: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#806928}

Copy link
Collaborator

@wpt-pr-bot wpt-pr-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review process for this patch is being conducted in the Chromium project.

@chromium-wpt-export-bot chromium-wpt-export-bot changed the title [WIP] New trailing space handling New trailing spaces handling Aug 31, 2020
@chromium-wpt-export-bot chromium-wpt-export-bot force-pushed the chromium-export-cl-2047943 branch 7 times, most recently from eb37fc8 to e19aab4 Compare September 3, 2020 19:02
@chromium-wpt-export-bot chromium-wpt-export-bot changed the title New trailing spaces handling New TextBreakIterator's default behavior breaks after space run Sep 5, 2020
@chromium-wpt-export-bot chromium-wpt-export-bot force-pushed the chromium-export-cl-2047943 branch 10 times, most recently from 342941f to 25da957 Compare September 11, 2020 11:38
@chromium-wpt-export-bot chromium-wpt-export-bot force-pushed the chromium-export-cl-2047943 branch 4 times, most recently from dcdf651 to 9185dac Compare September 15, 2020 06:43
Before this change, we were considering breaking opportunities before
space runs. This approach allowed us to avoid re-shaping in many cases,
which has an important advantage in terms of performance.

However, the Unicode spec (UAX#14) state that breaking before a space
character is not allowed [1], so we had to implement this logic after
our TextBreakIterator had already determined the best breaking
opportunity. This approach has been working fine so far for regular
spaces (white-space, tabs, ...), but it doesn't work correctly for
other BA [2] class characters; in the CSS Text specification, these are
known as "other space separators" [3].

In order to implement the correct behavior for any kind of space, we
would need to change our TextBreakIterator implementation so that
matches the Unicode rules, considering breaking opportunity after
space runs. This change should also consider the performance impact
of the extra re-shaping operations required to deal with trailing
spaces.

In order to prevent performance regressions, we'll store the position
of the 'end of non-hangable run', which will be used in case of items
with styles dictating rules to collapse trailing spaces.

[1] https://unicode-org.atlassian.net/browse/ICU-20843
[2] https://www.unicode.org/reports/tr14/tr14-39.html#BA
[3] https://drafts.csswg.org/css-text-3/#other-space-separators

Change-Id: Ie4a3890c75a3faff1a0155d4a40bcaa85bc6ac06
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2047943
Commit-Queue: Javier Fernandez <jfernandez@igalia.com>
Reviewed-by: Koji Ishii <kojii@chromium.org>
Cr-Commit-Position: refs/heads/master@{#806928}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants