Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-main-thread, speculative HTML parsing #3847

Open
mbrubeck opened this issue Oct 29, 2014 · 24 comments · Fixed by #24148
Open

Off-main-thread, speculative HTML parsing #3847

mbrubeck opened this issue Oct 29, 2014 · 24 comments · Fixed by #24148
Labels
A-content/parsers Related to parsing HTML and XML B-interesting-project Represents work that is expected to be interesting in some fashion I-perf-slow Unnecessary performance degredation.

Comments

@mbrubeck
Copy link
Contributor

By running the HTML parser in a separate task from script, we could allow tokenization and speculative tree building to continue while content scripts are fetched and executed.

References on HTML parser threading in Gecko and WebKit:

@mbrubeck mbrubeck added I-perf-slow Unnecessary performance degredation. A-content/parsers Related to parsing HTML and XML B-interesting-project Represents work that is expected to be interesting in some fashion labels Oct 29, 2014
@kmcallister
Copy link
Contributor

This would basically entail writing an implementation of TreeSink which implements every method by sending a message on a channel. The node handles can just be sequentially assigned IDs. We would also keep the TrustedNodeAddress implementation so we don't pay for off-main-thread overhead when we don't use it.

There is also the possibility of doing tokenization off-thread and tree construction on thread. I don't how that would perform in comparison, but it's an easy experiment to try.

@psdh
Copy link
Contributor

psdh commented Feb 19, 2015

I am trying to implement this!

@psdh
Copy link
Contributor

psdh commented Feb 19, 2015

^^ Should I implement TreeSink on top of existing implementation, implementing the missing functions and adding the sending a message part.
Can you also, elaborate on the sending a message part

@jdm
Copy link
Member

jdm commented Feb 19, 2015

You can learn more about message passing in Rust at http://doc.rust-lang.org/book/concurrency.html

@psdh
Copy link
Contributor

psdh commented Feb 28, 2015

Okay, After irc discussions with kmc, and jdm, I have understood that I need to
Add one more implementation of TreeSink that would implement each of these methods by sending a enum to a channel(say message), this enum would have information about
i) target
ii) arguments
And then I queue these dom manipulations as enum vairants( in a vec?) and flush them from time to time
When do we make the actual changes(to the dom) though?
Is it when we flush them, we can check which enum variant has been stored and make changes according to that?

@jdm
Copy link
Member

jdm commented Feb 28, 2015

As you say, the DOM changes would occur when flushing them in the main script task. There would need to be a synchronization point - when the script task is going to execute a newly-parsed script element, it needs to operate on the up-to-date DOM tree. Does that make sense?

@kmcallister
Copy link
Contributor

I queue these dom manipulations as enum vairants(in a vec?)

And/or in the message channel itself.

You'll need a cross-thread way for tree ops to refer to nodes that haven't been constructed yet, but will be constructed by a previous tree op. Gecko uses nsIContent** for this: a pointer to a slot which will eventually hold a pointer to the node, when the main thread gets around to constructing it.

This approach seems pretty unsafe, and anyway will be impossible without unsafe code in Rust. So for Servo's parser, I'd like to try using simple integer IDs, and then figure out what the performance cost is.

When do we make the actual changes(to the dom) though?

I think this depends on why you're doing the parse.

If you're speculatively parsing characters after </script> while the main thread is running the script, then you can't commit any DOM manipulations until the script finishes and you check whether it did anything to invalidate your speculative assumptions. The Gecko link above has more to say about this.

You can speculatively fetch any URLs you've discovered in the parse, even if the DOM manipulations are discarded. I think there's no need to coordinate that with the main thread at all.

We can also do the initial page load parse off-main-thread, which allows parallelism between the parser and the actual DOM manipulations (a bottleneck today for whatever reason). In that case I think you'd execute the tree ops as soon as possible, until you see a </script>.

Naturally, <script defer> and <script async> don't block parsing in the same way. I'm not sure about the details; you'll have to consult the spec. It's not necessary to implement these right away, but we should keep them in mind.

@LifeIsStrange
Copy link

Just for say that some blockers addons like uBlock Origin or noscript will need to disable this feature.

@mbrubeck
Copy link
Contributor Author

Just for say that some blockers addons like uBlock Origin or noscript will need to disable this feature.

Can you explain why? This feature does not cause any additional fetches or script executions. uBlock and NoScript work fine in Firefox and other browsers that do off-thread parsing.

@tetsuharuohzeki
Copy link
Contributor

@LifeIsStrange

uBlock blocks the network traffic which is in a blacklist, and NoScript just disable to execution script without in the whitelist. This would not prevent them

@LifeIsStrange
Copy link

Sorry, I expressed myself badly,I Just speaked about prefetching.

UBlock Origin do this: " Disable "Prefetch resources to load pages more quickly"

This will ensure no TCP connection is opened at all for blocked requests: It's for your own protection privacy-wise.[1]
For pages with lots for blocked requests, this will actually remove overhead from page load (if you did not have the setting already disabled).
When uBlock blocks a network request, the expectation is that it blocks completely the connection, hence the new permission is necessary for uBlock to do truthfully what it says it does."

@Syrcon
Copy link

Syrcon commented Jan 18, 2016

I'll try to implement this as my master's thesis.

As @jdm said, there is some timeline needed.
So the main period of work is February - April 2016.

@frewsxcv
Copy link
Contributor

frewsxcv commented May 5, 2016

Did you make any progress on this @Syrcon?

@Syrcon
Copy link

Syrcon commented May 5, 2016

Well, @jdm helped with making a parser work asynchronously, and I'm working right now on executing parser in its own thread. So there will be some commits by the weekend.

You'd like to work on this?

@frewsxcv
Copy link
Contributor

frewsxcv commented May 5, 2016

No, no rush at all! If it was the case that nothing had been done for this, I was going to remove the C-assigned label so someone else could work on it. Keep up the great work :)

@jdm jdm removed the C-assigned There is someone working on resolving the issue label Jan 5, 2017
@jdm
Copy link
Member

jdm commented Jan 30, 2017

My experimental branch that tried to make the parser asynchronous is https://github.com/jdm/servo/tree/asyncparse.

@mrmiywj
Copy link
Contributor

mrmiywj commented Mar 3, 2017

@jdm, hi jdm, I'm going to use this idea as my GSoC 2017 project idea. Is there anyone working on this? I will try to explore the working flows about this idea recently. Is there any related suggested issue to work on?

@jdm
Copy link
Member

jdm commented Mar 3, 2017

I don't think there are any other issues related to this.

@LifeIsStrange
Copy link

LifeIsStrange commented Mar 4, 2017

@mrmiywj actually there is this
servo/html5ever#25
And this one
#1009

@dudelson
Copy link

dudelson commented Mar 4, 2017

@jdm I'd also like to take a cursory look into this project, if you don't mind multiple people investigating it at once.

@jdm
Copy link
Member

jdm commented Mar 5, 2017

It's intended as a GSoC project, so it's probably best suited for people intending to apply for that.

@dudelson
Copy link

dudelson commented Mar 5, 2017

Sorry for not being clear. I am also a prospective GSoC student.

@nox
Copy link
Contributor

nox commented Sep 30, 2017

Cc @cynicaldevil

bors-servo pushed a commit that referenced this issue Sep 6, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 7, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 9, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 9, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 9, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
bors-servo pushed a commit that referenced this issue Sep 10, 2019
Prefetch img, scripts and stylesheets during parsing

<!-- Please describe your changes on the following line: -->

Eagerly tokenize html input, and scan for `script` and `img` tags which can be prefetched into the cache. This allows resources to be loaded concurrently, which would otherwise be loaded sequentially due to being blocked by script execution. On the cloth webgl demo, this takes the time-to-paint down from 732ms to 560ms.

---
<!-- Thank you for contributing to Servo! Please replace each `[ ]` by `[X]` when the step is complete, and replace `___` with appropriate data: -->
- [x] `./mach build -d` does not report any errors
- [x] `./mach test-tidy` does not report any errors
- [x] These changes fix #3847
- [x] These changes do not require tests because it's a perf improvement

<!-- Also, please make sure that "Allow edits from maintainers" checkbox is checked, so that we can help you if you get stuck somewhere along the way.-->

<!-- Pull requests that do not address these steps are welcome, but they will require additional verification as part of the review process. -->

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/24148)
<!-- Reviewable:end -->
@nox
Copy link
Contributor

nox commented Sep 12, 2019

#24148 doesn't fix this.

@nox nox reopened this Sep 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-content/parsers Related to parsing HTML and XML B-interesting-project Represents work that is expected to be interesting in some fashion I-perf-slow Unnecessary performance degredation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.