Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI is broken #1768

Closed
HackerFoo opened this issue Nov 11, 2020 · 18 comments
Closed

CI is broken #1768

HackerFoo opened this issue Nov 11, 2020 · 18 comments
Assignees

Comments

@HackerFoo
Copy link
Contributor

It looks like the build has been broken since #1760

@HackerFoo
Copy link
Contributor Author

Simple designs such as counter and buttons are unroutable. It could be a problem with VPR.

@mithro
Copy link
Contributor

mithro commented Nov 11, 2020

@litghost / @acomodi - Can you please look into this?

@acomodi
Copy link
Contributor

acomodi commented Nov 11, 2020

@HackerFoo you have seen this for the vtr+Symbiflow CI, correct? In that case this PR solves the issue: #1711.
I have forced a Kokoro run as there were some infrastructure failures.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

If you are speaking about the vtr+Symbiflow, that is a result of in progress master+wip updates. To be clear, there are 3 CI's on arch-defs, Travis, kokoro vtr+Symbiflow and kokoro arch-defs. Both Travis and kokoro arch-defs should both be green right now.

@HackerFoo
Copy link
Contributor Author

Then VtR+SymbiFlow doesn't need to pass?

The top level "Architecture Definitions (Presubmit)" is also failing on #1735, but I think it's because of the QuickLogic test.

Another strange thing is that several of the tests have not yet completed for the last 2 PRs.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

The top level "Architecture Definitions (Presubmit)" is also failing on #1735, but I think it's because of the QuickLogic test.

If you rebase onto master, then the QuickLogic test will pass trivially. Kokoro should be running as "presubmit as merged", but because it is part of the kokoro configuration it may be more restrictive.

Then VtR+SymbiFlow doesn't need to pass?

No. VtR + SymbiFlow is a CI that is intended to show that the current master+wip on https://github.com/symbiflow/vtr-verilog-to-routing or master on https://github.com/verilog-to-routing/vtr-verilog-to-routing/ doesn't work with the particular revision of https://github.com/SymbiFlow/symbiflow-arch-defs/ . Given that upstream master VTR and master+wip VTR can make changes that manifest issues with arch-defs (e.g. XML changes or verilog-to-routing/vtr-verilog-to-routing#1571 ), it is possible for the VtR + SymbiFlow CI to be red as expected. The signal is to investigate why the newer version of VTR no longer works. In this particular case, changes in how the virtual RR graph was generated (because of upstream PR verilog-to-routing/vtr-verilog-to-routing#1448) are why the CI is red. @acomodi has filed issues about this and has a PR working on the solution to the change.

This is why arch-defs uses environment.yml to decouple updates to yosys and VTR.

The key here is the what the various CI's are for.

  1. Travis and kokoro arch-defs test whether the current revision of arch-defs works. These are the CI's to pay attention too and make sure are green for the purposes of PR review and checking the health of master.
  2. kokoro VtR master+wip + SymbiFlow and kokoro VtR master + SymbiFlow test whether the upcoming versions of VtR work with master arch-defs. Ideally this stays green, but it will go red when an incompatible change arises on upstream. They provide early signals of integration issues. I believe @acomodi for example discovered Unroutable connection on SymbiFlow tests verilog-to-routing/vtr-verilog-to-routing#1571 from the VtR master+wip + SymbiFlow CI.
GitHub
SymbiFlow WIP changes for Verilog to Routing -- Open Source CAD Flow for FPGA Research - SymbiFlow/vtr-verilog-to-routing
GitHub
Verilog to Routing -- Open Source CAD Flow for FPGA Research - verilog-to-routing/vtr-verilog-to-routing
GitHub
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs

@litghost
Copy link
Contributor

Another strange thing is that several of the tests have not yet completed for the last 2 PRs.

Do you mean that you don't see the kokoro arch-defs CI on the latest master commits? That is because the previous continuous job is still running. I investigated it, and it due to a routing job taking a while to completed.

@HackerFoo
Copy link
Contributor Author

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

I see that too, but if you look at the failing tests, are passing at the script level. For example:

https://source.cloud.google.com/results/invocations/8ecd20df-54dc-4d60-820f-20a9b7224a24/log

[ID: 7673017] Build finished after 17563 secs, exit value: 0


Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[14:24:56] Collecting build artifacts from build VM
[14:27:05] Kokoro builder finished

We've been seeing some issues around artifact collection, so I suspect a Kokoro hiccup on those.

Develop your code on the Google Cloud Platform.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

For reference, the master xc7 build for #1760 is green here: https://source.cloud.google.com/results/invocations/ba1043e9-80b6-4508-ac34-3bc488cc96f6/targets

Develop your code on the Google Cloud Platform.

@litghost
Copy link
Contributor

litghost commented Nov 11, 2020

When I check the merge details for #1762 or #1760, there are several either failed or "Tool failed" tests.

I see that too, but if you look at the failing tests, are passing at the script level. For example:

https://source.cloud.google.com/results/invocations/8ecd20df-54dc-4d60-820f-20a9b7224a24/log

[ID: 7673017] Build finished after 17563 secs, exit value: 0


Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
[14:24:56] Collecting build artifacts from build VM
[14:27:05] Kokoro builder finished

We've been seeing some issues around artifact collection, so I suspect a Kokoro hiccup on those.

So the issue is likely that Kokoro hit out of disk when moving the workspace for artifact collection. Because kokoro applies the collection filtering after copying all files from the workspace (don't ask me), the addition of RapidWright and some more third_party submodules, we likely were scrapping the upper disk limit. This neatly explains why it is intermittent, because the working directory size may vary.

@HackerFoo
Copy link
Contributor Author

The "Xilinx Series 7 - Install (Presubmit)" test is now failing with:

ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content
for url 'https://conda.anaconda.org/litex-hub/linux-64/gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2'.
  download saved to: /tmpfs/src/github/symbiflow-arch-defs-presubmit-install/env/downloads/conda-pkgs/gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2
  expected md5: 98bd7a66867ab138fdb104c53df1db44
  actual md5: 1e303a894909bad44b81fb1910adff22

@pgielda
Copy link

pgielda commented Nov 12, 2020

I've just downloaded this file manually

# md5sum gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2 
98bd7a66867ab138fdb104c53df1db44  gcc-riscv64-elf-nostdc-9.2.0-20200923_200922.tar.bz2

It seems to have the expected md5

@HackerFoo
Copy link
Contributor Author

Now I get this from architecture presubmit tests:

Traceback (most recent call last):
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1566, in <module>
    main()
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1534, in main
    node_remap = create_node_remap(capnp_graph.graph.nodes, channels_obj)
  File "/tmpfs/src/github/symbiflow-arch-defs-presubmit-install/xc/common/utils/prjxray_routing_import.py", line 1333, in create_node_remap
    coord = tuple(hilbert_curve.coordinates_from_distance(h))
AttributeError: 'HilbertCurve' object has no attribute 'coordinates_from_distance'

@HackerFoo
Copy link
Contributor Author

HackerFoo commented Nov 13, 2020

It looks like that dependency just updated breaking compatibility: https://pypi.org/project/hilbertcurve/#history
Ugh.

PyPI
Construct Hilbert Curves.

@HackerFoo
Copy link
Contributor Author

The vendor tool test also fails with:

/tmpfs/src/github/symbiflow-arch-defs-presubmit-xc7-vendor/env/RapidWright/bin/rapidwright_classpath.sh was not found, check if RapidWright has been built.

link

@litghost
Copy link
Contributor

The vendor tool test also fails with:

/tmpfs/src/github/symbiflow-arch-defs-presubmit-xc7-vendor/env/RapidWright/bin/rapidwright_classpath.sh was not found, check if RapidWright has been built.

link

I've been on the lookout for this cropping up. The relevant PR on the upstream RapidWright repo was merged before the closed portion of RapidWright was updated. I've asked Xilinx to cut a new release of the closed portion to fix this, but that won't be for a bit. I've proposed tracking a fork of RapidWright for now: #1772

@litghost
Copy link
Contributor

litghost commented Dec 8, 2020

Continuous CI has been green for the last 6 runs, closing.

@litghost litghost closed this as completed Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants