Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arty A7-35 Litex bootloader seems not reacting correctly to ARP response ? #41

Closed
Ruinland opened this issue Aug 9, 2020 · 20 comments
Closed
Assignees

Comments

@Ruinland
Copy link

Ruinland commented Aug 9, 2020

Hi,

Thanks for the great project.

I just build the top.bit from examples/xc7/linux_litex_demo with pre-gen files.
And I tried Ethernet boot with the setup from LiteX wiki except changing my TFTP server port from 69 to 6069 because the bootloader suggests "Fetching from: UDP/6069" .

Yet the bootloader complains that it fails to fetch anything from my host.
The tcpdump log shows that it sent a lot of ARP request for knowing my where my host is :

16:53:11.817485 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:11.970841 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.124201 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.277749 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.431063 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.584230 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.737583 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:12.890927 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.044465 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.197815 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.350960 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.504306 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.657864 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.811210 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46
16:53:13.964525 ARP, Request who-has 192.168.100.100 tell 192.168.100.50, length 46

I also tried the serial boot, with the default image json file from Linux-on-Litex .

Yet it doesn't work either - - nothing shows up after the bootloader's "Lift off" message.

For cross-examination, I use the bitstream from Linux-on-Litex's prebuilt repo and Ethernet boot the rootfs/kernel/dtb and emulator.bin provided by this repo ( I mean symbiflow-examples). It boots.

So I'm kinda head-spinning right now with no clues about what went wrong.

The bitstream I generated is uploaded to Google Drive since GitHub prevents me from doing so.

Thanks again for any kind of help :-)

= = = = = = = =

BTW, I just went to COSCUP and told my fellow Taiwanese about how great the FOSS FPGA toolchain could be.
After the presentation, someone pinged me that SymbiFlow could go through the whole-nine-yards with VPR now.
So I'm trying to reproduce the results and thus to update the slides.

@mithro
Copy link
Contributor

mithro commented Aug 10, 2020

Hi @Ruinland,

You might want to join us on IRC at #symbiflow on irc.freenode.net or on our Slack channel and we can help debug in real time.

@tcal-x and @kgugala will hopefully be able to help debug what is going on if you can't make it there.

@tcal-x
Copy link
Contributor

tcal-x commented Aug 10, 2020

Hi @Ruinland ,

I kind of recall the opposite condition -- if you just had the LiteX BIOS on Arty, and you tried to ping it, it wouldn't reply, and with Wireshark I saw Arty ignoring the ARP requests.

But here, Arty is looking for the host, to connect via TFTP. The first thing I would do is check your host's firewall setting. On Ubuntu, you would use "ufw".

Not booting after serial boot (I assume that took quite a while!) and getting stuck at "Liftoff" is probably a separate issue. The expected memory segment offsets might not be matching between LiteX and the Linux bundle json. I'd need to do some investigating to find what Litex is expecting and whether they match the offsets in the json that you linked.

As I'm looking at the instructions here, it doesn't really tell you how to load the Linux image firmware, either via netboot or serial boot, right? So you had to figure it out from other places?

In the case that DID work for you, did you use serialboot or netboot?

@tcal-x
Copy link
Contributor

tcal-x commented Aug 10, 2020

P.S. I'm assuming you're updated the IP address of your host's ethernet port to be 192.168.100.100 correctly. It occurred to me that you might also need to bind your tftpd to this interface and not for example your wifi interface.

I've found debugging tftp to be quite difficult -- even trying to figure out the right version of tftpd to install. The instructions here builds its own tftpd to make sure it has the version that it wants! Note that overall those instructions might not be compatible with the ones you are following. Anyway, the tftpd built in the timvideos instructions is nice since it spits out some debugging messages on the host side, so you can see if any connection is being made, or if you're experiencing timeouts.

@Ruinland
Copy link
Author

Ruinland commented Aug 11, 2020

Hi, thanks for the quick reply and sorry for my late reply :-)

The first thing I would do is check your host's firewall setting. On Ubuntu, you would use "ufw".

AFAIK, I disabled my firewall completely. And I can netboot Linux on the bitstream from Linux-on-LiteX with the same network configuration.

what Litex is expecting and whether they match the offsets in the json that you linked.

That's what I feel suspicious, too.
If memories serve me correctly, LiteX once changed the offsets somewhere between July, 2019 ~ May, 2020 which caused me to check what was happening back then.

Could you tell me when does the "VexRiscv_Linux.v" got generated ? So I can tracked down whether I need to change the offsets.

As I'm looking at the instructions here, it doesn't really tell you how to load the Linux image firmware, either via netboot or serial boot, right? So you had to figure it out from other places?
In the case that DID work for you, did you use serialboot or netboot?

Sorry that I might mis-understand you (my English is not well).
If you're asking whether I had figured out how to load bitsteam and Linux image from Linux-on-LitexVexriscv to Arty A7, yes .
I've tried both netboot and serialboot successfully.

For netboot, basically I have a wiring setup like this :
https://imgur.com/a/fKqsOVA
( I know. The board in the photo is a Lattice ECP-5. I've tried Arty A7 with the same wiring as well. I didn't take a picture for A7. )

I've disabled firewall. I bring down any NICs except the USB dongle shown in the picture with the IP set to 192.168.100.100 . (basically the same as this note I've written before.)

And I put stuffs in these two directories : ( 1 and 2 ) to the tftp server root.

I typed :
xc3sprog -c nexys4 top.bit && picocom -b 1000000 /dev/ttyUSB1
(The UART baud rate for their bitstream a bit strange.)

Then the LiteX bootloader loads everything it need to the address offset it wants.
The Linux boots.

The problem I'm encountering is :
I tried to do the same procedure on the the top.bit generated from this repo's "linux_litex_demo"
with the linux kernel image, rootfs.cpio, emulator.bin and rv32.dtb provided inside this repo's .
I changed the tftp port to 6069 and the baud rate to 115200.

It doesn't work. ( The ARP log above. )

If I'm not doing it correctly, is it possible that you could ELI5 me step by step a bit ?

Imgur
Post with 3 views. Board Setup for Booting

@Ruinland
Copy link
Author

I dived into the wiki you pointed to, it's bringing host's atftpd up.
I'll try it later (still at my day job's office now). Maybe the bitstream generated here is more sensitive to something I don't know (timing I guess ?)

@Ruinland
Copy link
Author

Ruinland commented Aug 11, 2020

@mithro

You might want to join us on IRC at #symbiflow on irc.freenode.net or on our Slack channel and we can help debug in real time.

Surely.
Could you provide me the URL for the slack you're using?

@kgugala
Copy link
Member

kgugala commented Aug 11, 2020

@Ruinland I just checked the Linux example on two different Arty boards and 2 different PCs - it works fine. I'm pretty sure the network issues you're seeing are related to you local network configuration.

As for the uploading over UART - I added the images.json file (https://github.com/SymbiFlow/symbiflow-examples/blob/master/examples/xc7/linux_litex_demo/images.json). You can use this it to load the binaries with litex_term. You can use the following command (run it from the linux_litex_demo directory):

litex_term --speed 115200 --images images.json --no-crc /dev/ttyUSBX
# where X is the number of the ttyUSB interface, if no other USB <-> Serial converters but the one on Arty board are connected to your PC this should be /dev/ttyUSB1

Note that UART loading i very slow

GitHub
Examples designs for showing different ways to use SymbiFlow toolchains. - SymbiFlow/symbiflow-examples

@kgugala
Copy link
Member

kgugala commented Aug 11, 2020

@mithro

You might want to join us on IRC at #symbiflow on irc.freenode.net or on our Slack channel and we can help debug in real time.

Surely.
Could you provide me the URL for the slack you're using?

@Ruinland you can use this Slack invite to sign in to the SymbiFlow Slack channel https://join.slack.com/t/symbiflow/shared_invite/enQtNTkyMjcyNTkzOTY4LTU0MzhmYWNjOGMyMTkyNjA0MmEyMWM5OWY3ZDg5MWQ3ODlmOWQwZjk2YzBmMDBjMzkzMzNjYjkwYjAxZTMyNjQ

Slack
Slack is where work flows. It's where the people you need, the information you share, and the tools you use come together to get things done.

@Ruinland
Copy link
Author

Sadly, the serialboot still failed ...... (the log is appended below.)
I'll buy a new Arty A7 later, maybe my board is broken or so.
Sorry for occupying your precious time. Feel free to close this issue.

Yet could you do one thing for me ?
I really want to try a supposed to work, vpr-generated bitstream on my board.
It would be nice if you could provide me a top.bit so I can try one last time.

Serial boot failed log :

$ xc3sprog -c nexys4 ./build/top.bit && litex_term --speed 115200 --images images.json --no-crc /dev/ttyUSB1
XC3SPROG (c) 2004-2011 xc3sprog project $Rev: 795 $ OS: Linux
Free software: If you contribute nothing, expect nothing!
Feedback on success/failure/enhancement requests:
	http://sourceforge.net/mail/?group_id=170565 
Check Sourceforge for updates:
	http://sourceforge.net/projects/xc3sprog/develop

Using Libftdi, 
[LXTERM] Starting....
Initializing SDRAM...
SDRAM now under software control
Read leveling:
m0, b0: |00000000000000000000001111111111| delays: 27+-05
m0, b1: |00000000000000000000000000000000| delays: -
m0, b2: |00000000000000000000000000000000| delays: -
m0, b3: |00000000000000000000000000000000| delays: -
m0, b4: |00000000000000000000000000000000| delays: -
m0, b5: |00000000000000000000000000000000| delays: -
m0, b6: |00000000000000000000000000000000| delays: -
m0, b7: |00000000000000000000000000000000| delays: -
best: m0, b0 delays: 27+-05
m1, b0: |00000000000000000000001111111111| delays: 27+-05
m1, b1: |00000000000000000000000000000000| delays: -
m1, b2: |00000000000000000000000000000000| delays: -
m1, b3: |00000000000000000000000000000000| delays: -
m1, b4: |00000000000000000000000000000000| delays: -
m1, b5: |00000000000000000000000000000000| delays: -
m1, b6: |00000000000000000000000000000000| delays: -
m1, b7: |00000000000000000000000000000000| delays: -
best: m1, b0 delays: 27+-05
SDRAM now under hardware control
Memtest OK

--============== Boot ==================--
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
[LXTERM] Received firmware download request from the device.
[LXTERM] Uploading buildroot/Image to 0xc0000000 (4652188 bytes)...
[LXTERM] Upload complete (9.9KB/s).
[LXTERM] Uploading buildroot/rootfs.cpio to 0xc0800000 (4062720 bytes)...
[LXTERM] Upload complete (9.9KB/s).
[LXTERM] Uploading buildroot/rv32.dtb to 0xc1000000 (1838 bytes)...
[LXTERM] Upload complete (10.4KB/s).
[LXTERM] Uploading emulator/emulator.bin to 0xc1100000 (2992 bytes)...
[LXTERM] Upload complete (9.6KB/s).
[LXTERM] Booting the device.
[LXTERM] Done.
KExecuting booted program at 0xc1100000

--============= Liftoff! ===============--
**

@tcal-x
Copy link
Contributor

tcal-x commented Aug 12, 2020

Hi @Ruinland ,

I don't see any evidence that your hardware is faulty. This case is much more complex than just building a bitsteam. You have both the bitstream, plus Linxu images, and they need to be in 100% agreement about "where things are" in the address space, or you get the hang that you saw. The images.json file must specify the correct offsets that agree with what was built into the LiteX bitstream. Also, the device tree information must agree (addresses of control registers). If the bitstream and the Linux image were not built together, there is a risk of mismatch. If you're using the 3 files in the example directory, then we should probably assume that they are correct with respect to the LiteX configuration, so that leaves the images.json file as the prime suspect (for the serialboot hang at Liftoff).

But netboot would be much better if you could get that working. Do you have another laptop that you can plug into the ethernet switch to test if the tftp server is working?

I'm sorry I missed your VexRiscV question earlier -- if you look at the .tcl file used to build top.bit (it should be in the same directory as top.bit, probably called top.tcl), there will be a line where it loads the VexRiscV Verilog. I don't think any offset information is built into that file, though.

If you just want to run a VPR-generated bitstream and it doesn't need to be Linux, there are other examples that I can provide.

Hopefully I can try running through this tonight, and see if I can get netboot to work. Sorry about the delay.

@tcal-x
Copy link
Contributor

tcal-x commented Aug 13, 2020

Hi @Ruinland , some of the things I wrote above a wrong -- I was assuming this was like the usual LiteX build, but it's not.

I see the VexRiscv Verilog is right there in the top directory, so you have no clue how it was generated -- that is true. But it is also true that I don't think it is related to the issues you're having.

Also, I see Karol added an images.json file. I'll try that with serial boot and see if it works for me.

@tcal-x
Copy link
Contributor

tcal-x commented Aug 13, 2020

@Ruinland , @kgugala -- I see exactly the same stall at Liftoff with serial boot, as in @Ruinland 's previous comment. I got everything from the symbiflow-examples/examples/xc7/linux_litex_demo/ directory --- I built top.bit using the Makefile, and I used the images.json and the files that it points to.

@Ruinland
Copy link
Author

Ruinland commented Aug 13, 2020

Hi @tcal-x ,
Thanks for the help.

If you're using the 3 files in the example directory

Yes, I do use the files in the example directories (buildroot and emulator) in this repo.

if you look at the .tcl file used to build top.bit (it should be in the same directory as top.bit, probably called top.tcl), there will be a line where it loads the VexRiscV Verilog

Thanks for the tip. I generated top.v and the top.tcl from GitHub litex-hub/linux-on-litex-vexriscv repo (commit 4f913ce9).
And I tried to modify the Makefile in linux_litex_demo in this repo to use that top.v and a VexRiscv_Linux.v which I've generated [1] and booted before. Unfortunately, the top.v breaks VPR during "Load circuit" stage.

Message: Failed to find matching architecture model for 'ICAPE2'
The entire flow of VPR took 1.28 seconds (max_rss 78.7 MiB)
make: *** [Makefile.mine:26: build/top.net] Error 1

======================

I see exactly the same stall at Liftoff with serial boot

Does Litex/VexRiscv implement RISC-V debug spec with OpenOCD support ?
If so, would you kindly tell me how to set it up ?
Even though I'm not an EDA dev, I'm still capable to help debugging :-)

[1] From GitHub litex-hub/litex-data-cpu-vexriscv @ commit e2a818a

@tcal-x
Copy link
Contributor

tcal-x commented Aug 13, 2020

Hi @Ruinland , I'm afraid this is getting beyond my expertise. I do know VexRiscV has GDB debug support, but you need to have a Debug variant, as in https://github.com/litex-hub/pythondata-cpu-vexriscv/blob/654057b2f5cec0f9fc99487dff67861f76fcbe7e/pythondata_cpu_vexriscv/verilog/VexRiscv_LinuxDebug.v.

There's more info at https://github.com/SpinalHDL/VexRiscv.

The Litex configuration would also need to be updated to allow actual debugging on the board. I don't know how that works in detail.

GitHub
Python module containing verilog files for vexriscv cpu (for use with LiteX). - litex-hub/pythondata-cpu-vexriscv
GitHub
A FPGA friendly 32 bit RISC-V CPU implementation. Contribute to SpinalHDL/VexRiscv development by creating an account on GitHub.

@kgugala
Copy link
Member

kgugala commented Aug 13, 2020

The binaries are fine, there was a bug in images.json - emulator.bin was loaded @ incorrect address. 2bc55db fixes the issue.

@tcal-x
Copy link
Contributor

tcal-x commented Aug 14, 2020

@Ruinland , it works for me now, so give it another try.

@Ruinland
Copy link
Author

2020-08-14-194428_1920x1080_scrot

I have no idea how does this happen.

@kgugala
Copy link
Member

kgugala commented Aug 14, 2020

Are you sure you started UART loading? Please reset the CPU in LiteX system after you open a serial connection with litex_term.
(The CPU can be reset with a red pushbutton labeled reset - should be near PMOD JD and JP1 jumper)

@kgugala
Copy link
Member

kgugala commented Aug 14, 2020

and remember to use --no-crc option

@Ruinland
Copy link
Author

Thanks for all kinds of help.

Though being very Voodoo, I managed to sort things out .
Just like @kgugala said, the binaries (Linux image, rootfs, dtb and the rv32.dtb) are fine and the bitstream is fine as well .
The netboot failure I encountered is really a configuration issue, yet I cannot explain.

The only things I did to make serialboot work is that I used an openocd compiled from latest scratch instead of xc3sprog and the one on Arch Linux repo.

And for netboot, I set my host (192.168.100.100) to be the default gateway and it suddenly works.
Yet I swear to God that I didn't have to do it before. And I don't even know why I think this might help ......

All in all.
I really really thank everyone here to help me out.
Long live SymbiFlow 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants