Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch img, scripts and stylesheets during parsing #24148

Merged
merged 4 commits into from Sep 11, 2019

Conversation

asajeffrey
Copy link
Member

@asajeffrey asajeffrey commented Sep 5, 2019

Eagerly tokenize html input, and scan for script and img tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.



This change is Reviewable

@asajeffrey asajeffrey added I-perf-slow Unnecessary performance degredation. A-content/script Related to the script thread labels Sep 5, 2019
@highfive
Copy link

highfive commented Sep 5, 2019

Heads up! This PR modifies the following files:

  • @KiChjang: components/script/dom/servoparser/mod.rs, components/script/dom/servoparser/prefetch.rs, components/net/resource_thread.rs, components/net_traits/lib.rs

@highfive highfive added the S-awaiting-review There is new code that needs to be reviewed. label Sep 5, 2019
@highfive
Copy link

highfive commented Sep 5, 2019

warning Warning warning

  • These commits modify net and script code, but no tests are modified. Please consider adding a test!

@asajeffrey
Copy link
Member Author

@asajeffrey
Copy link
Member Author

r? @jdm

@highfive highfive assigned jdm and unassigned nox Sep 6, 2019
@asajeffrey
Copy link
Member Author

One thing I'm not sure about is that we're issuing a GET when prefetching, even for content which isn't cacheable. We should probably first issue a HEAD to see if the content is cacheable. Perhaps this should be a follow-up issue?

Copy link
Member

@jdm jdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this model. We just need to make sure that we don't unintentionally bypass security features of the platform by prefetching more liberally than the actual request the page intends to make.

components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
inner: HtmlTokenizer<PrefetchSink>,
}

unsafe_no_jsmanaged_fields!(Tokenizer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to derive JSTraceable on Tokenizer instead to avoid accidents in the future if we update that struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is weird, I replied to this comment but github appears to have lost the reply...

Anyway, the problem is that HtmlTokenizer doesn't implement JSTraceable, and it's difficult to do so as its generic, and the tokenizer API provides no way to access the sink. If it did, the implementation would be pretty easy:

    impl<Sink: JSTraceable> for HtmlTokenizer<Sink> {
        fn trace(&self) { self.sink().trace() } 
   } 

but there's no sink() method, so we can't really do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a JSTraceable implementation for the HTML tokenizer in script/dom/servoparser/html.rs. I recommend following its example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, but stripping out the unnecessary code leaves just:

#[allow(unsafe_code)]
unsafe impl JSTraceable for HtmlTokenizer<PrefetchSink>> {
    unsafe fn trace(&self, trc: *mut JSTracer) {
    }
}

which doesn't have obvious advantages over unsafe_no_jsmanaged_fields!.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we still derive JStraceable on Sink and call self.sink.trace() at least?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we don't have access to the sink, since the tokenizer doesn't expose an API to access it. We couuuld use an Rc, but that seems a bit round-the-houses.

Copy link
Member

@jdm jdm Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the tokenizer this type? It has a public sink field. What am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I was looking for an accessor method, I didn't spot that the field was public. D'oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fixed.

components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
components/script/dom/servoparser/prefetch.rs Outdated Show resolved Hide resolved
@jdm jdm added S-needs-code-changes Changes have not yet been made that were requested by a reviewer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Sep 6, 2019
@asajeffrey
Copy link
Member Author

I don't think there's security issues here, since the data requested by the prefetch is only used to populate the cache, and isn't exposed to script. If we do set the headers according to htmlscriptelement and htmlimageelement, then we do need to make sure we're not sending cookies we shouldn't.

@asajeffrey
Copy link
Member Author

@asajeffrey asajeffrey added S-needs-squash Some (or all) of the commits in the PR should be combined. S-awaiting-review There is new code that needs to be reviewed. and removed S-needs-code-changes Changes have not yet been made that were requested by a reviewer. labels Sep 6, 2019
@asajeffrey asajeffrey changed the title Prefetch img and scripts during parsing Prefetch img, scripts and stylesheets during parsing Sep 6, 2019
@asajeffrey
Copy link
Member Author

One thing I realized doing this is that fetch uses the origin, which is mutable, but for speculative prefetching we don't get much choice but to use the original origin.

@jdm jdm added S-needs-code-changes Changes have not yet been made that were requested by a reviewer. and removed S-awaiting-review There is new code that needs to be reviewed. labels Sep 6, 2019
@highfive highfive added S-awaiting-review There is new code that needs to be reviewed. and removed S-needs-code-changes Changes have not yet been made that were requested by a reviewer. labels Sep 6, 2019
@jdm jdm removed the S-awaiting-review There is new code that needs to be reviewed. label Sep 6, 2019
@jdm
Copy link
Member

jdm commented Sep 6, 2019

@bors-servo try=wpt

@bors-servo
Copy link
Contributor

⌛ Trying commit 330b940 with merge 419002d...

bors-servo pushed a commit that referenced this pull request Sep 6, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

⌛ Trying commit c6e1b8a with merge 8a20090...

bors-servo pushed a commit that referenced this pull request Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

💔 Test failed - status-taskcluster

@highfive highfive added the S-tests-failed The changes caused existing tests to fail. label Sep 10, 2019
@asajeffrey
Copy link
Member Author

OK, that was a CI failure, the WPT test suite passed!

@asajeffrey
Copy link
Member Author

Now waiting for a review from @nox.

@asajeffrey
Copy link
Member Author

@highfive highfive removed the S-tests-failed The changes caused existing tests to fail. label Sep 10, 2019
@asajeffrey
Copy link
Member Author

Squashed.

Copy link
Contributor

@nox nox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit anxious about doing that tbh, but it looks ok apart from some nits.

@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> {
}
}

impl FetchTaskTarget for () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a new struct type, don't piggy-back the unit type. It's cheap to make a new one anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -101,6 +102,10 @@ pub struct ServoParser {
aborted: Cell<bool>,
/// <https://html.spec.whatwg.org/multipage/#script-created-parser>
script_created_parser: bool,
/// We do a quick-and-dirty parse of the input looking for resources to prefetch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a big TODO that we should be carrying the speculative parsing PR to completion and remove that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The PR you mean is #19203?

@@ -425,20 +432,50 @@ impl ServoParser {
)
}

fn push_tendril_input_chunk(&self, chunk: StrTendril) {
if !chunk.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return if the chunk is empty instead of drifting towards the right.

fn push_bytes_input_chunk(&self, chunk: Vec<u8>) {
// For byte input, we convert it to text using the network decoder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvious comment is obvious. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).

prefetch_input.push_back(chunk.clone());
let _ = self.prefetch_tokenizer
.borrow_mut()
.feed(&mut *prefetch_input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenizer here will be very wrong whenever someone uses document.write, I don't like that but I don't have any suggestion apart from "let's not land this PR and let's land speculative parsing instead".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().

}

impl TokenSink for PrefetchSink {
type Handle = ();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a new unit struct type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are very keen on unit structs. Done.

},
(TagKind::EndTag, local_name!("script")) => {
// After the first script tag, the main parser is blocked, so it's worth prefetching.
self.prefetching = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that's the place where we should not prefetch ever, because the script blocking the parser may very well set window.domain or do stuff that makes this end tag's script not load or whatever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.

(TagKind::StartTag, local_name!("base")) => {
if let Some(url) = self.get_url(tag, local_name!("href")) {
debug!("Setting base {}", url);
self.base = url;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely wrong in presence of multiple base tags in an document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the spec has a note about this (https://html.spec.whatwg.org/multipage/semantics.html#the-base-element). Fixed.

impl TokenSink for PrefetchSink {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
if let Token::TagToken(ref tag) = token {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please early return out of the function if token isn't a TagToken, that will help with rightwards drift.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch));
}
// Don't prefetch inside script
self.prefetching = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing but script data will be ever return from parsing a script, please use the proper TokenSinkResult return value to tell the tokenizer to switch modes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

@asajeffrey asajeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I addressed everything.

@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> {
}
}

impl FetchTaskTarget for () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -101,6 +102,10 @@ pub struct ServoParser {
aborted: Cell<bool>,
/// <https://html.spec.whatwg.org/multipage/#script-created-parser>
script_created_parser: bool,
/// We do a quick-and-dirty parse of the input looking for resources to prefetch
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The PR you mean is #19203?

prefetch_input.push_back(chunk.clone());
let _ = self.prefetch_tokenizer
.borrow_mut()
.feed(&mut *prefetch_input);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write().

fn push_bytes_input_chunk(&self, chunk: Vec<u8>) {
// For byte input, we convert it to text using the network decoder.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the matching comment on push_string_input_chunk wasn't (at least not to me!).

self.network_input.borrow_mut().push_back(chunk.into());
// Convert the chunk to a tendril so cloning it isn't expensive.
// The input has already been decoded as a string, so doesn't need
// to be decoded by the network decoder again.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mentioning it was confusing to me though, I spent a while debugging before I realized I was re-encoding UTF-8 string data using the network decoder. I'll remove the comment about tendrils being cheap though, that's more me talking to myself than anything.

}

impl TokenSink for PrefetchSink {
type Handle = ();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are very keen on unit structs. Done.

impl TokenSink for PrefetchSink {
type Handle = ();
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> {
if let Token::TagToken(ref tag) = token {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch));
}
// Don't prefetch inside script
self.prefetching = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},
(TagKind::EndTag, local_name!("script")) => {
// After the first script tag, the main parser is blocked, so it's worth prefetching.
self.prefetching = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.

(TagKind::StartTag, local_name!("base")) => {
if let Some(url) = self.get_url(tag, local_name!("href")) {
debug!("Setting base {}", url);
self.base = url;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.

@asajeffrey asajeffrey added the S-needs-squash Some (or all) of the commits in the PR should be combined. label Sep 11, 2019
@asajeffrey
Copy link
Member Author

@nox, you okay with the changes?

@nox
Copy link
Contributor

nox commented Sep 11, 2019

@bors-servo r=jdm,nox

@bors-servo
Copy link
Contributor

📌 Commit 2e6f14f has been approved by jdm,nox

@highfive highfive added S-awaiting-merge The PR is in the process of compiling and running tests on the automated CI. and removed S-awaiting-review There is new code that needs to be reviewed. S-needs-squash Some (or all) of the commits in the PR should be combined. labels Sep 11, 2019
bors-servo pushed a commit that referenced this pull request Sep 11, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

⌛ Testing commit 2e6f14f with merge 4e9967b...

@bors-servo
Copy link
Contributor

☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster
Approved by: jdm,nox
Pushing 4e9967b to master...

@bors-servo bors-servo merged commit 2e6f14f into servo:master Sep 11, 2019
@highfive highfive removed the S-awaiting-merge The PR is in the process of compiling and running tests on the automated CI. label Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/script Related to the script thread I-perf-slow Unnecessary performance degredation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Off-main-thread, speculative HTML parsing
5 participants