New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetch img, scripts and stylesheets during parsing #24148
Conversation
Heads up! This PR modifies the following files:
|
r? @jdm |
One thing I'm not sure about is that we're issuing a GET when prefetching, even for content which isn't cacheable. We should probably first issue a HEAD to see if the content is cacheable. Perhaps this should be a follow-up issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of this model. We just need to make sure that we don't unintentionally bypass security features of the platform by prefetching more liberally than the actual request the page intends to make.
inner: HtmlTokenizer<PrefetchSink>, | ||
} | ||
|
||
unsafe_no_jsmanaged_fields!(Tokenizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to derive JSTraceable on Tokenizer instead to avoid accidents in the future if we update that struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is weird, I replied to this comment but github appears to have lost the reply...
Anyway, the problem is that HtmlTokenizer doesn't implement JSTraceable
, and it's difficult to do so as its generic, and the tokenizer API provides no way to access the sink. If it did, the implementation would be pretty easy:
impl<Sink: JSTraceable> for HtmlTokenizer<Sink> {
fn trace(&self) { self.sink().trace() }
}
but there's no sink()
method, so we can't really do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a JSTraceable implementation for the HTML tokenizer in script/dom/servoparser/html.rs. I recommend following its example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, but stripping out the unnecessary code leaves just:
#[allow(unsafe_code)]
unsafe impl JSTraceable for HtmlTokenizer<PrefetchSink>> {
unsafe fn trace(&self, trc: *mut JSTracer) {
}
}
which doesn't have obvious advantages over unsafe_no_jsmanaged_fields!
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we still derive JStraceable on Sink and call self.sink.trace()
at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we don't have access to the sink, since the tokenizer doesn't expose an API to access it. We couuuld use an Rc, but that seems a bit round-the-houses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the tokenizer this type? It has a public sink
field. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, I was looking for an accessor method, I didn't spot that the field was public. D'oh!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fixed.
I don't think there's security issues here, since the data requested by the prefetch is only used to populate the cache, and isn't exposed to script. If we do set the headers according to htmlscriptelement and htmlimageelement, then we do need to make sure we're not sending cookies we shouldn't. |
One thing I realized doing this is that fetch uses the origin, which is mutable, but for speculative prefetching we don't get much choice but to use the original origin. |
@bors-servo try=wpt |
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
💔 Test failed - status-taskcluster |
OK, that was a CI failure, the WPT test suite passed! |
Now waiting for a review from @nox. |
c6e1b8a
to
aeac382
Compare
Squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit anxious about doing that tbh, but it looks ok apart from some nits.
components/net_traits/lib.rs
Outdated
@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> { | |||
} | |||
} | |||
|
|||
impl FetchTaskTarget for () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a new struct type, don't piggy-back the unit type. It's cheap to make a new one anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -101,6 +102,10 @@ pub struct ServoParser { | |||
aborted: Cell<bool>, | |||
/// <https://html.spec.whatwg.org/multipage/#script-created-parser> | |||
script_created_parser: bool, | |||
/// We do a quick-and-dirty parse of the input looking for resources to prefetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's write a big TODO that we should be carrying the speculative parsing PR to completion and remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The PR you mean is #19203?
@@ -425,20 +432,50 @@ impl ServoParser { | |||
) | |||
} | |||
|
|||
fn push_tendril_input_chunk(&self, chunk: StrTendril) { | |||
if !chunk.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return if the chunk is empty instead of drifting towards the right.
fn push_bytes_input_chunk(&self, chunk: Vec<u8>) { | ||
// For byte input, we convert it to text using the network decoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvious comment is obvious. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the matching comment on push_string_input_chunk
wasn't (at least not to me!).
prefetch_input.push_back(chunk.clone()); | ||
let _ = self.prefetch_tokenizer | ||
.borrow_mut() | ||
.feed(&mut *prefetch_input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer here will be very wrong whenever someone uses document.write
, I don't like that but I don't have any suggestion apart from "let's not land this PR and let's land speculative parsing instead".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write()
.
} | ||
|
||
impl TokenSink for PrefetchSink { | ||
type Handle = (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a new unit struct type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are very keen on unit structs. Done.
}, | ||
(TagKind::EndTag, local_name!("script")) => { | ||
// After the first script tag, the main parser is blocked, so it's worth prefetching. | ||
self.prefetching = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like that's the place where we should not prefetch ever, because the script blocking the parser may very well set window.domain
or do stuff that makes this end tag's script not load or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.
(TagKind::StartTag, local_name!("base")) => { | ||
if let Some(url) = self.get_url(tag, local_name!("href")) { | ||
debug!("Setting base {}", url); | ||
self.base = url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely wrong in presence of multiple base tags in an document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the spec has a note about this (https://html.spec.whatwg.org/multipage/semantics.html#the-base-element). Fixed.
impl TokenSink for PrefetchSink { | ||
type Handle = (); | ||
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> { | ||
if let Token::TagToken(ref tag) = token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please early return out of the function if token
isn't a TagToken
, that will help with rightwards drift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch)); | ||
} | ||
// Don't prefetch inside script | ||
self.prefetching = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing but script data will be ever return from parsing a script, please use the proper TokenSinkResult
return value to tell the tokenizer to switch modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I addressed everything.
components/net_traits/lib.rs
Outdated
@@ -237,6 +237,18 @@ impl FetchTaskTarget for IpcSender<FetchResponseMsg> { | |||
} | |||
} | |||
|
|||
impl FetchTaskTarget for () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -101,6 +102,10 @@ pub struct ServoParser { | |||
aborted: Cell<bool>, | |||
/// <https://html.spec.whatwg.org/multipage/#script-created-parser> | |||
script_created_parser: bool, | |||
/// We do a quick-and-dirty parse of the input looking for resources to prefetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The PR you mean is #19203?
prefetch_input.push_back(chunk.clone()); | ||
let _ = self.prefetch_tokenizer | ||
.borrow_mut() | ||
.feed(&mut *prefetch_input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Speculative parsing will also be wrong in that case, not much you can do apart from add this to the list of reasons not to use document.write()
.
fn push_bytes_input_chunk(&self, chunk: Vec<u8>) { | ||
// For byte input, we convert it to text using the network decoder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the matching comment on push_string_input_chunk
wasn't (at least not to me!).
self.network_input.borrow_mut().push_back(chunk.into()); | ||
// Convert the chunk to a tendril so cloning it isn't expensive. | ||
// The input has already been decoded as a string, so doesn't need | ||
// to be decoded by the network decoder again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not mentioning it was confusing to me though, I spent a while debugging before I realized I was re-encoding UTF-8 string data using the network decoder. I'll remove the comment about tendrils being cheap though, that's more me talking to myself than anything.
} | ||
|
||
impl TokenSink for PrefetchSink { | ||
type Handle = (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are very keen on unit structs. Done.
impl TokenSink for PrefetchSink { | ||
type Handle = (); | ||
fn process_token(&mut self, token: Token, _line_number: u64) -> TokenSinkResult<()> { | ||
if let Token::TagToken(ref tag) = token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.send(CoreResourceMsg::Fetch(request, FetchChannels::Prefetch)); | ||
} | ||
// Don't prefetch inside script | ||
self.prefetching = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
}, | ||
(TagKind::EndTag, local_name!("script")) => { | ||
// After the first script tag, the main parser is blocked, so it's worth prefetching. | ||
self.prefetching = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but if we waited to find out if script did any of that, we wouldn't be able to prefetch at all.
(TagKind::StartTag, local_name!("base")) => { | ||
if let Some(url) = self.get_url(tag, local_name!("href")) { | ||
debug!("Setting base {}", url); | ||
self.base = url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base "There can be only one element in a document." But yes, I can imagine this causing resources to be prefetched erroneously.
@nox, you okay with the changes? |
@bors-servo r=jdm,nox |
📌 Commit 2e6f14f has been approved by |
Prefetch img, scripts and stylesheets during parsing <!-- Please describe your changes on the following line: --> Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms. --- <!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: --> - [x] `./mach build -d` does not report any errors - [x] `./mach test-tidy` does not report any errors - [x] These changes fix #3847 - [x] These changes do not require tests because it's a perf improvement <!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.--> <!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. --> <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148) <!-- Reviewable:end -->
☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster |
Eagerly tokenize html input, and scan for
script
andimg
tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms../mach build -d
does not report any errors./mach test-tidy
does not report any errorsThis change is