Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automating browsers using Chrome DevTools Protocol #5

Closed
jugglinmike opened this issue Feb 1, 2019 · 29 comments
Closed

Automating browsers using Chrome DevTools Protocol #5

jugglinmike opened this issue Feb 1, 2019 · 29 comments

Comments

@jugglinmike
Copy link
Contributor

Summary

Introduction of an integration with the Chrome browser built on the proprietary Chrome DevTools Protocol ("CDP"). This will be an alternative to the existing WebDriver integration, enabled via explicit opt-in.

Motivation

Although the WebDriver standard is defined to empower web developers to automate browsers, its unidirectional processing model limits the kind of interaction it can simulate. In 2018, the Browser Tools and Testing Working Group acknowledged that a bi-directional protocol extension is desirable and in-scope.

CDP is a functional implementation of such a protocol. By adopting it today, WPT can promote discussion around standardization and attract more implementers to the process. In the longer term, WPT will enjoy the same benefits that motivate the bidirectional protocol: more expressive tests.

In more immediate terms, this will give WPT more control when testing experimental releases of Chrome. That's because Chromedriver occasionally lags behind Chrome; WPT can't test such releases.

Risks

This new code will increase WPT's maintenance burden in two ways:

  • concurrently supporting two sets of abstractions for browser automation (WebDriver and CDP)
  • integrating with an interface whose stability is unknown

Separately, it may encourage a major consumer/contributor (Chromium) to bypass the standardization process for test automation features.

Details

This was first proposed by @gsnedders in February of 2018.

The new code will be maintained within the WPT repository itself. It will be subject to the same criteria for code quality and correctness, and the automated verification procedures (as provided by, e.g. TravisCI) will form part of the acceptance criteria for subsequent patches. It will necessarily involve some additional third-party code. Whatever libraries are selected will be compatible with WPT's BSD-style license.

A proof-of-concept was built in the fall of 2018. The source code is available online as is a presentation delivered to review technical considerations.

We expect the support of the Chrome DevTools Protocol maintainers in finalizing the implementation.

Regarding the second risk (bypassing testing standards), WPT already supports such practices today. We maintain tests which reference the Chromium-developed "Mojo" JavaScript interface for standards like Web Bluetooth, Generic Sensors, WebUSB, and WebXR.

@jugglinmike
Copy link
Contributor Author

As active participants in the Browser Tools and Testing Working Group, the following folks may have more to add on the topic: @AutomatedTester, @andreastt, @burg, @shs96c, and @whimboo.

@jgraham
Copy link
Contributor

jgraham commented Feb 4, 2019

In general I think adding additional executors to wptrunner is fine, so at a high level I don't have any objections. It is of course the responsibility of the people adding the executor to maintain it over the long term.

Having said that I think there are some considerations that should be made by anyone planning to add an executor. In terms of API surface the addition of testdriver means we are exposing more, and more complex, interaction APIs to tests, following the rule that any such interaction must be defined in a web standard. At present the only web standard which defines these generic actions is WebDriver, so any non-WebDriver implementation needs to replicate WebDriver semantics. For the marionette executor this is trivial because it implments WebDriver command semantics directly. For CDP that isn't true, so there is more complexity required in the executor. In the current implementation that complexity goes in the ChromeDriver layer, so in effect the proposal here is to rewrite a subset of ChromeDriver in Python so that changes no longer need to be coordinated with the ChromeDriver maintainers. Again, I have no objection if it's easier to take on the extra work, but you should not assume it will be trivial.

I also think the benefits in terms of the possible standardisation of CDP are overstated. To make my position clear; I believe that allowing developers to test websites in multiple browsers is an essential part of facilitating a healthy browser ecosystem, and the recent trend to test using Chrome-specific tooling is harmful and must be reversed. I think the superior usability of a duplex remote control protocol is understood to be one of the primary drivers of that change, and standardising CDP looks like the most plausible way of filling that technical gap. But until the standard exists we can't let people writing tests use CDP-only features. Given how long the process of standardisation typically takes, I suspect we are more than a year away from having a standards-track document of sufficent quality to use as the basis for any testdriver features.

Of course it is also possible that the political benefits you allude to are a real short term gain from having this code. It is less clear to me why having a CDP implemenation in wpt specifically changes the dynamic there, but it is definitely possible that you're correct.

So, again, I have no objections to this change. If people want to add and maintain an executor based on any remote protocol or other way of interacting with the browser, that's fine, as long as they maintain it and as long as it isn't later used as an argument for changing the rules around what's permitted in the standardised testdriver API.

@gsnedders
Copy link
Member

By adopting it today, WPT can promote discussion around standardization and attract more implementers to the process.

I'm unconvinced that this is true. As long as only Chrome is using it, it doesn't really promote standardisation.

There's been enough concerns about about the willingness of the Chrome team to standardise the protocol (which seems to have been marred for years in internal Google politics), as well as a fair number of concerns around the design of the protocol.

While it seems increasingly likely we're going to have ever more desire to standardise CDP with it slowly getting adopted across browsers, as long as it is unclear whether the Chrome team (and the devtools team especially) will be in favour of standardisation and be willing to make any changes from the current protocol in their implementation.

Aside from this, I essentially agree with @jgraham's response above, though to add:

In the current implementation that complexity goes in the ChromeDriver layer, so in effect the proposal here is to rewrite a subset of ChromeDriver in Python so that changes no longer need to be coordinated with the ChromeDriver maintainers.

Given both ExecutorCDP and ChromeDriver will be/are maintained by the Chrome team, is this not something that can be dealt with at an organisational level within the Chrome team rather than (essentially) multiple parts of the Chrome team maintaining near identical code? That seems very wasteful, but ultimately that's a concern for the Chrome team and not for WPT.

@youennf
Copy link

youennf commented Feb 4, 2019

It also seems premature to me to use CDP in WPT.
I would wait for CDP standardization and cross-browser CDP adoption/commitment before starting to use it in WPT.

@jugglinmike
Copy link
Contributor Author

It is of course the responsibility of the people adding the executor to maintain it over the long term.

This distinction reduces the cohesiveness of the group maintaining WPT infrastructure. It's at odds with a "team" mentality which I am assuming is desirable. It also risks code rot given that not all contributors are employed to maintain WPT.

Given the choice, I would much rather see commitment from the entire group on these RFCs, even if that delays consensus.

as long as it is unclear whether the Chrome team (and the devtools team especially) will be in favour of standardisation and be willing to make any changes from the current protocol in their implementation.

Concerns around standardization seem to apply to Chrome's Mojo API as well, but as the proposal probably makes clear, I wasn't able to reconcile that. The first Mojo references were accepted before this group formed, so maybe it's simply grandfathered in from a time with different standards. Or is there some difference between the technologies that I'm missing?

@burg
Copy link

burg commented Feb 4, 2019

CDP is a functional implementation of such a protocol. By adopting it today, WPT can promote discussion around standardization and attract more implementers to the process. In the longer term, WPT will enjoy the same benefits that motivate the bidirectional protocol: more expressive tests.

Is there an important use case for WPT that WebDriver does not address? I want to hear about it. I don't know what you mean by "more expressive tests".

In more immediate terms, this will give WPT more control when testing experimental releases of Chrome. That's because Chromedriver occasionally lags behind Chrome; WPT can't test such releases.

It sounds like the easiest way to fix this situation is for Chromium or its contributors to implement WebDriver support properly. After over a year of talk, there are still no spec drafts nor independent test suites for CDP, so I have no reason to believe there is any upside to this proposal.

@jgraham
Copy link
Contributor

jgraham commented Feb 4, 2019

This distinction reduces the cohesiveness of the group maintaining WPT infrastructure. It's at odds with a "team" mentality which I am assuming is desirable. It also risks code rot given that not all contributors are employed to maintain WPT.

I am not claiming that if this lands you will personally be on the hook for all future patches. But throwing code over the wall and expecting other people to block changes on maintaining it whilst not contributing towards that maintainance is also problematic, even if there is initial agreement that the code can land. To solve this I think such a feature should come with a clear ownership story, which in this case would probably mean that Google's ecosystem-infra team would agree to be primarily responsible for keeping it working. I consider the current situation with executormarionette.py to be analogous, with Mozilla committing to keeping it working.

Concretely, to define the scope of that commitment, I would expect that people making small changes would fix all executors, and would be held to that in code review. On the other hand, I would not expect that large changes requiring substantial per-executor work would necessarily be blocked on fixing every executor (although such changes will need an RFC so there will be some oppertunity to object to things that are hard to implement for whatever reason), and I would not expect third parties to scramble to fix problems caused by external changes breaking an executor (e.g. if the underlying protocol changes). For marionette we avoid the latter problem by running it in mozilla-central CI, so we are confident that kind of breakage is as rare as possible.

Concerns around standardization seem to apply to Chrome's Mojo API as well

I thought I was told that Mojo was an unfortunate implementation detail required to enable spec-compliant testing APIs for some standards in Blink. If that isn't the case I agree we should consider ripping out tests that depend on it. Practically this has primarilly affected testing of Blink-only features, so the test compatibility issue hasn't arisen in practice yet. That's very different to the testdriver situation where most APIs are general purpose.

@jugglinmike
Copy link
Contributor Author

Is there an important use case for WPT that WebDriver does not address?

In those meeting minutes, @shs96c (credited as "simonstewart") said:

there is a desire to register with a browser and find out when things change slate

@reillyeon (credited as "reillyg") had a slightly more concrete use case:

[..] need permissions for device access, bi-directional API would certainly help [...] we tried a less bi-directional way of doing this that had edge case problems, migrated toward using events, which works better

@jugglinmike
Copy link
Contributor Author

Google's ecosystem-infra team would agree to be primarily responsible for keeping it working

Maybe we could ask for commitment from Chrome to use this CDPExecutor in their infrastructure. That would give us a lot more confidence in the continued integration.

@LukeZielinski recently gave a presentation about their efforts to reduce the amount of internal infrastructure Google uses to run WPT; this could be a part of that effort.

Do folks at Mozilla use ExecutorMarionette to test Firefox?

@gsnedders
Copy link
Member

This distinction reduces the cohesiveness of the group maintaining WPT infrastructure. It's at odds with a "team" mentality which I am assuming is desirable. It also risks code rot given that not all contributors are employed to maintain WPT.

Given part of the motivation is being able to test current master of Chromium without waiting for ChromeDriver, I don't think it's reasonable to expect this group, collectively, to update ExecutorCDP as needed when CDP changes in backwards incompatible ways (as it is want to do and is, AFAIK, the primary cause of ChromeDriver breaking on master); we shouldn't be expected to keep up with what an unstable Chromium-defined protocol does.

It's already the case that application-specific executors are essentially expected to be maintained by those with an interest in the application; the Marionette executor is essentially maintained by @jgraham, the Servo executor is essentially maintained by @jdm, etc.

As @jgraham says, we can and do land code that doesn't work cross-executor from day one (an obvious recent example is testdriver.js when originally landed only worked with the Selenium and WebDriver executors, and it was left down to those maintaining the other executors to implement support).

Perhaps we should make it possible to use the WebDriver executor for all browsers (which support it!) and then we have that as a baseline, and it should, for the foreseeable future, be possible to run all tests against all browsers regardless of executor up-to-dateness?

Do folks at Mozilla use ExecutorMarionette to test Firefox?

Yes.

@jgraham
Copy link
Contributor

jgraham commented Feb 5, 2019

@LukeZielinski recently gave a presentation about their efforts to reduce the amount of internal infrastructure Google uses to run WPT; this could be a part of that effort.

If CDP is easier to run against content_shell than WebDriver, such that Chrome could switch to using wptrunner internally with the former and not the latter, that would be an excellent reason to have the CDP executor, independent of any other considerations.

Perhaps we should make it possible to use the WebDriver executor for all browsers (which support it!) and then we have that as a baseline

This is sort of scope creep, but without actually running that configuration in CI it might well break, but we certainly want to keep using the preferred executors for master runs. So there are at least tradeoffs in doing this (though it may nevertheless be a good idea).

@jgraham
Copy link
Contributor

jgraham commented Feb 5, 2019

Is there an important use case for WPT that WebDriver does not address?

In those meeting minutes, @shs96c (credited as "simonstewart") said:

there is a desire to register with a browser and find out when things change slate

@reillyeon (credited as "reillyg") had a slightly more concrete use case:

[..] need permissions for device access, bi-directional API would certainly help [...] we tried a less bi-directional way of doing this that had edge case problems, migrated toward using events, which works better

Although there certainly are things that work better in a bidirectional protocol like CDP than a command-response protocol like WebDriver, note again that it wouldn't be possible to add such features to the testing API until such a time as they were covered by a web standard (in practice, a web standard with vendor buy-in since the RFC process would presumably reject any extensions that couldn't be implemented in multiple browsers).

@foolip
Copy link
Member

foolip commented Feb 8, 2019

With #1 (the process itself) resolved yesterday, let's act as if this RFC was filed yesterday and give at least until Feb 14 before making a decision on this.

It is my team (Ecosystem Infra) which has asked for this prototype and now its integration into WPT, so I am of course very supportive of this.

What is the point of this? We have identified the difficulty of adding test automation capabilities as a problem for increased use of web-platform-tests in Chromium, affecting a large fraction of new features which are "hard to test" for one reason or another. @LukeZielinski has some diagrams he might be able to share of just how many layers and steps are involved.

We're looking at many options for decreasing the overall complexity, perhaps most importantly using wptrunner in Chromium to avoid implementing testdriver.js internals twice. This RFC is another prong of investigation, to see what happens if we remove the WebDriver layer from the puzzle and use CDP directly. Is it faster or slower? Less work to add new features? We don't know yet, and this is an experiment towards getting some answers. It's unknown which of ExecutorWebDriver and ExecutorCDP we'll want to use in which contexts long term.

For any capabilities we add to testdriver.js, the behavior has to be well defined and have tests, so that anyone can implement it without reverse engineering. This is the same bar as for web-exposed features. Currently the only way to define vendor-neutral remote debugging commands is via the WebDriver spec or WebDriver extension specs like https://w3c.github.io/permissions/#automation. This will remain the case until there's an alternative.

To address some of the specific feedback:

Is there an important use case for WPT that WebDriver does not address? I want to hear about it. I don't know what you mean by "more expressive tests".

@burg, the type:untestable label tracks things we can't currently test, plenty of which could be solved by new WebDriver endpoints. Some cases like reading console output might be awkward to express using WebDriver, but none impossible because polling can always emulate bidirectional communication.

I think such a feature should come with a clear ownership story, which in this case would probably mean that Google's ecosystem-infra team would agree to be primarily responsible for keeping it working.

@jgraham, yes, our team and in particular @LukeZielinski will be responsible for the future of ExecutorCDP.

Maybe we could ask for commitment from Chrome to use this CDPExecutor in their infrastructure. That would give us a lot more confidence in the continued integration.

@jugglinmike, I think we'd like to try full Chrome runs with both ExecutorWebDriver and ExecutorCDP in both Taskcluster (one-off) and Chrome's infrastructure, to see how they compare.

It also seems premature to me to use CDP in WPT

@youennf, given the above, and the precedence of ExecutorMarionette, do you still feel this is premature?

@LukeZielinski
Copy link
Contributor

Thanks @foolip for the great explanation.

What is the point of this? We have identified the difficulty of adding test automation capabilities as a problem for increased use of web-platform-tests in Chromium, affecting a large fraction of new features which are "hard to test" for one reason or another. @LukeZielinski has some diagrams he might be able to share of just how many layers and steps are involved.

The diagrams Philip refers to can be found near the top of the design doc to use WPT in Chromium CI

@jgraham
Copy link
Contributor

jgraham commented Feb 8, 2019

The diagrams Philip refers to can be found near the top of the design doc to use WPT in Chromium CI

That doc isn't public.

@LukeZielinski
Copy link
Contributor

The diagrams Philip refers to can be found near the top of the design doc to use WPT in Chromium CI

That doc isn't public.

Apologies, should be fixed now.

@shs96c
Copy link

shs96c commented Feb 8, 2019

The statement "By adopting it today, WPT can promote discussion around standardization and attract more implementers to the process." has the process of standardisation exactly backwards. We only landed support for the webdriver protocol in WPT once the standardisation process was effectively complete, and there was support implemented in multiple browsers. If the primary driver for this effort is to support standardisation of the protocol, then I'd suggest a better approach would be to put forward an initial draft spec for discussion.

If the problem is that the chromedriver lags behind the implementation of chrome, I fail to see how that is a problem the W3C needs to overcome, and it would seem to support the risk @jugglinmike highlighted: "it may encourage a major consumer/contributor (Chromium) to bypass the standardization process for test automation features."

If the problem is "more expressive tests", then I would be interested in seeing those cases where the webdriver protocol is incapable of expressing the desired behaviour, and understanding which specs are being prevented from adding tests by this.

I acknowledge that a bi-directional protocol would be beneficial, and I'm very much in favour of standardising such a protocol. As it stands, the CDP makes a good starting point, but, to my eyes, the proper starting point would be an ED of that spec.

@foolip
Copy link
Member

foolip commented Feb 8, 2019

I fear that the framing of generally "automating browsers" and mention of CDP standardization has made it seem like this proposal goes much further than it actually does. The proposal is to add a product-specific executor similar to executormarionette or executorservo and see if we find it provides any upside in terms of speed or ease of extensibility. At this point we're not suggesting using it by default anywhere, ceasing to maintain ExecutorWebDriver for Chrome, or striving to make ExecutorCDP for any browser other than Chrome.

I certainly also agree that having a shared bi-directional protocol would be good, but I don't consider promoting that discussion part of the motivation for this RFC. Before we go there we should see if there are any benefits for even a single browsers. For the time being, WebDriver and its extension specs are the only mechanism we have for defining cross-browser remote control capabilities. Any addition to testdriver.js will go through the RFC process now, and I trust it would not go unnoticed if capabilities not backed by WebDriver are proposed.

@gsnedders
Copy link
Member

I think to be accepted, the current RFC draft needs to be heavily rewritten if the goal is just to add a product-specific executor, and not positioning it as a longer-term project to supplant at least the the WebDriver executor.

@foolip
Copy link
Member

foolip commented Feb 11, 2019

Maybe striking some bits would suffice, but I'll leave it to @jugglinmike how to proceed.

@gsnedders

This comment has been minimized.

@foolip

This comment has been minimized.

@jgraham

This comment has been minimized.

@foolip
Copy link
Member

foolip commented Feb 19, 2019

@jugglinmike following #9, would you mind doing this as a PR?

@jgraham
Copy link
Contributor

jgraham commented Feb 19, 2019

FTR, my opinion is that if this is an internal wptrunner change that the Chrome team want and will maintain there isn't a need to write an RFC because it becomes an implementation detail rather than a change in an externally-facing API. Of course I don't want to stop you writing up an RFC if you believe it's helpful to do so.

@foolip
Copy link
Member

foolip commented Feb 20, 2019

@jgraham there would have to be either a new chrome_cdp product or a command line argument to change the executor, do you think that's internal enough to skip the RFC process?

There might also be new third-party dependencies, which haven't been spelled out completely in this issue but should be.

@gsnedders
Copy link
Member

@jgraham is OOO for the rest of this week, FWIW.

I think we should move towards being able to use command line args to change the executor, and given in the Chrome case we're likely to have both executors around for a long time we should definitely use an argument for this.

Third-party dependencies I think should only be a concern for the RFC process if it affects what's needed for {manifest, lint, run chrome, run firefox, run safari, run edge}, which therefore depends on which the default executor is?

@jgraham
Copy link
Contributor

jgraham commented Jun 11, 2019

@jgraham there would have to be either a new chrome_cdp product or a command line argument to change the executor, do you think that's internal enough to skip the RFC process?

Yes. I think everything proposed here counts as internal in the sense that it doesn't produce new requirements or obligations on people not using the proposed code. I think this RFC can be dropped and worked on as a normal issue.

However any proposal to add test-accessible features which depend on features of CDP would require an RFC and would almost certainly be rejected at this time, on the basis that there isn't a clear multi-vendor standard for those features. I know no one is proposing such features at present, but I just want to make it very clear that accepting this proposal (or, rather closing it as not requiring an RFC) doesn't create precedent for any future proposals that may depend on it.

@foolip
Copy link
Member

foolip commented Jun 11, 2019

Thanks @jgraham, that all sounds good and in line with my thinking. @jugglinmike, I'll go ahead and close this issue, and the rest can be handled in regular code review.

@foolip foolip closed this as completed Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants