New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chrome crashes on Community-TC #20133
Comments
cc @jgraham |
We are certain that Chrome is failing to start in Docker:
which is a rather common problem, c.f. Puppeteer troubleshooting doc. We have a few workarounds on the Docker side as seen here:
Our understanding is that they have increasing security levels. The first one is easy to do on Community-TC, but particularly bad security-wise, because it gives Docker containers RW access to Alternatively, we could disable Chrome sandboxing. This is in fact better security-wise: we have password-less sudo inside Docker, so anyone can get root in a PR already. By disabling Chrome sandboxing, we are not loosening the container isolation at all, so we are not opening up new vulnerabilities. However, it is unclear whether it'd affect web-observable behaviours (I'm checking with the Chrome team). If not, I think this might be the best tradeoff. |
None of the aforementioned workarounds were used previously. We are not sure why it used to work, but have a theory that a different sandboxing mechanism was used because of old kernel version. On the old taskcluster.net instance, github-worker runs a 14.04-based image. According to Chromium docs, user namespace sandbox is only used on kernel >= 3.8. Yet 14.04 shipped 3.13, so we are not sure... |
We have two immediate fix candidates:
|
Whenever we see TASKCLUSTER_ROOT_URL in the entry point (runl.py), we disable Chrome sandboxing with --no-sandbox. Fix #20133
Whenever we see TASKCLUSTER_ROOT_URL in the entry point, we disable Chrome sandboxing with --no-sandbox. This is required to work around the lack of capabilities in the new community TaskCluster instances. Fix #20133
We believe that bc83451 should resolve this, by disabling the Chrome sandbox when running under the new taskcluster instances. Long term, we should investigate adding support to TaskCluster for custom seccomp profiles, but hopefully this puts out the fire for now. 🔥 <-- 🚒 |
We started Taskcluster migration (https://bugzilla.mozilla.org/show_bug.cgi?id=1574668) and now we are seeing widespread Chrome crashes on the new Community-TC instance.
Example: https://community-tc.services.mozilla.com/tasks/aMcaEBP4RIKa_EbrE30xJw/runs/0/logs/https%3A%2F%2Fcommunity-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FaMcaEBP4RIKa_EbrE30xJw%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive.log#L550
This PR has both Taskcluster and Community-TC jobs. Note that the crash only happens on Community-TC. After checking a few other jobs, it seems the crash is widespread on Community-TC.
The text was updated successfully, but these errors were encountered: