Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phaser2: kernels hang #647

Closed
jbqubit opened this issue Dec 18, 2016 · 9 comments
Closed

phaser2: kernels hang #647

jbqubit opened this issue Dec 18, 2016 · 9 comments
Assignees
Milestone

Comments

@jbqubit
Copy link
Contributor

jbqubit commented Dec 18, 2016

I can't test phaser2 due to kernel hanging.

phaser2 blink_forever.py hangs

(phaser2) britton@britton1:~/artiq-dev-phaser2/artiq/artiq/examples/master$
artiq_run ./repository/coredevice_examples/simple/blink_forever.py
--verbose --verbose
DEBUG:artiq.compiler.import_cache:hook installed
DEBUG:artiq.compiler.import_cache:added
'./repository/coredevice_examples/simple/blink_forever.py' to cache
DEBUG:artiq.compiler.import_cache:added
'/home/britton/artiq-dev-phaser2/artiq/artiq/coredevice/ttl.py' to
cache
DEBUG:artiq.coredevice.comm_tcp:connected to host 192.168.1.71 on port 1381
DEBUG:artiq.coredevice.comm_generic:sending message:
type=<_H2DMsgType.IDENT_REQUEST: 3>
DEBUG:artiq.coredevice.comm_generic:receiving message:
type=<_D2HMsgType.IDENT_REPLY: 2>
DEBUG:artiq.coredevice.comm_generic:sending message:
type=<_H2DMsgType.SWITCH_CLOCK: 4>
DEBUG:artiq.coredevice.comm_generic:receiving message:
type=<_D2HMsgType.CLOCK_SWITCH_COMPLETED: 3>
DEBUG:artiq.coredevice.comm_generic:sending message:
type=<_H2DMsgType.LOAD_KERNEL: 5>



(phaser2) britton@britton1:~/artiq-dev-phaser2/artiq/artiq/examples/phaser$
git branch
* (detached from d6f6ebf)
  phaser2


phaser2) britton@britton1:~/artiq-dev-phaser2/artiq/artiq/examples/master$
artiq_corelog
[           9us]  INFO(runtime): booting ARTIQ
[         194us]  INFO(runtime): software version 3.0.dev+411.gd6f6ebf
[         405us]  INFO(runtime): gateware version 3.0.dev+411.gd6f6ebf.dirty
[        4649us]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1004008us]  INFO(runtime): continuing boot
[     1004222us]  INFO(runtime::rtio_crg): startup RTIO clock: internal
[     1005228us]  INFO(runtime::session): running startup kernel
[     1019362us]  INFO(runtime::session): startup kernel finished
[     1019625us]  INFO(runtime::session): accepting network sessions in Rust
[     1020254us]  INFO(runtime::session): no connection, starting idle kernel
[     1027217us]  INFO(runtime::session): no idle kernel found
[     9769607us]  INFO(runtime::session): new connection from *:0
[     9769838us]  INFO(runtime::session): terminating running kernel
[    27597647us]  INFO(runtime::session): connection closed
[    27597943us]  INFO(runtime::session): no connection, starting idle kernel
[    27598342us]  INFO(runtime::session): no idle kernel found
[   143430837us]  INFO(runtime::session): new connection from *:0
[   143431081us]  INFO(runtime::session): terminating running kernel
[   166430411us]  INFO(runtime::session): connection closed
[   166430731us]  INFO(runtime::session): no connection, starting idle kernel
[   166431155us]  INFO(runtime::session): no idle kernel found
[   182545757us]  INFO(runtime::session): new connection from *:0
[   182546001us]  INFO(runtime::session): terminating running kernel
[   194078593us]  INFO(runtime::session): connection closed
[   194078912us]  INFO(runtime::session): no connection, starting idle kernel
[   194079344us]  INFO(runtime::session): no idle kernel found
[   195023130us]  INFO(runtime::session): new connection from *:0
[   195023373us]  INFO(runtime::session): terminating running kernel
[   203455211us]  INFO(runtime::session): connection closed
[   203455531us]  INFO(runtime::session): no connection, starting idle kernel
[   203455955us]  INFO(runtime::session): no idle kernel found
[   417193925us]  INFO(runtime::session): new connection from *:0
[   417194168us]  INFO(runtime::session): terminating running kernel
[   424656214us]  INFO(runtime::session): connection closed
[   424656533us]  INFO(runtime::session): no connection, starting idle kernel
[   424656957us]  INFO(runtime::session): no idle kernel found
[   453159679us]  INFO(runtime::session): new connection from *:0
[   453159923us]  INFO(runtime::session): terminating running kernel
[   455934818us]  INFO(runtime::session): connection closed

roll back to v2.1, blink_forever.py OK

Roll back to Artiq 2.1 all is well..... I can flash LED.

(artiq2) britton@britton1:~/artiq2/artiq/artiq/examples/master$ conda list artiq
# packages in environment at /home/britton/anaconda3/envs/artiq2:
#
artiq                     2.1             py_0+git453b29a    m-labs/label/main
artiq-kc705-nist_clock    2.1             py_0+git453b29a    m-labs/label/main
llvmlite-artiq            0.10.0.dev              py35_24    m-labs/label/main

(artiq2) britton@britton1:~/artiq2/artiq/artiq/examples/master$
artiq_run ./repository/coredevice_examples/simple/blink_forever.py
@jbqubit
Copy link
Contributor Author

jbqubit commented Dec 18, 2016

@jordens says this is due to lwip #598 and #637. Removing switch between Master and KC705 doesn't resolve. This Issue is blocking testing of phaser2.

@whitequark whitequark self-assigned this Dec 18, 2016
@whitequark
Copy link
Contributor

I believe this issue has been resolved, @jbqubit can you test with ARTIQ master please?

@jbqubit
Copy link
Contributor Author

jbqubit commented Jan 25, 2017

Using conda dev here's what I see.

$ artiq_flash -t kc705 -m nist_clock

Flashes my kc705 fine. I can ping board at 192.168.1.50.

$ artiq_corelog
[           8us]  INFO(runtime): ARTIQ runtime starting...
[         238us]  INFO(runtime): software version false
[         449us]  INFO(runtime): gateware version 3.0.dev+725.g591268a
[        3873us]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1003007us]  INFO(runtime): continuing boot
[     1003492us]  INFO(runtime::rtio_mgt): startup RTIO clock: internal
[     1010886us]  INFO(runtime::session): running startup kernel
[     1019324us]  INFO(runtime::session): resetting RTIO
[     1019965us]  INFO(runtime::session): startup kernel finished
[     1026783us]  INFO(runtime::session): accepting network sessions
[     1027043us]  INFO(runtime::session): no connection, starting idle kernel
[     1030722us]  INFO(runtime::session): no idle kernel found
[   114523123us]  INFO(runtime::session): new connection from 192.168.1.68:34527
[   114523499us]  INFO(runtime::session): terminating running kernel

Run:

$ artiq_run repository/coredevice_examples/simple/blink_forever.py

Took > 60 seconds to return. But the LED flashing did eventually start.

Reboot board while monitoring serial interface. Run:

$ artiq_run repository/coredevice_examples/simple/blink_forever.py

$ flterm /dev/ttyUSB3

MiSoC BIOS
(c) Copyright 2007-2016 M-Labs Limited
Built Jan 26 2017 06:31:03

BIOS CRC passed (a7d1dadd)
Initializing SDRAM...
Write leveling: 15* 16* 13  13   8   7   3   6  completed
Read bitslip: 7 6 5 4 3 
Read delays: 7:01-11  6:00-11  5:03-12  4:03-14  3:10-21  2:00-06  1:00-08  0:00-09  completed
Memtest OK
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Loading 360488 bytes from flash...
Executing booted program.
[         112us]  INFO(runtime): ARTIQ runtime starting...
[         322us]  INFO(runtime): software version false
[         540us]  INFO(runtime): gateware version 3.0.dev+725.g591268a
[        3965us]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1003078us]  INFO(runtime): continuing boot
[     1003586us]  INFO(runtime::rtio_mgt): startup RTIO clock: internal
[     1011011us]  INFO(runtime::session): running startup kernel
[     1019425us]  INFO(runtime::session): resetting RTIO
[     1020060us]  INFO(runtime::session): startup kernel finished
[     1026900us]  INFO(runtime::session): accepting network sessions
[     1027140us]  INFO(runtime::session): no connection, starting idle kernel
[     1030820us]  INFO(runtime::session): no idle kernel found
[     7465408us]  INFO(runtime::session): new connection from 192.168.1.68:34550
[     7465699us]  INFO(runtime::session): terminating running kernel
[     7544906us]  INFO(runtime::session): no connection, starting idle kernel
[     7545430us]  INFO(runtime::session): no idle kernel found
[    14770779us]  INFO(runtime::session): new connection from 192.168.1.68:34551
[    14771062us]  INFO(runtime::session): terminating running kernel
[    14976271us]  WARN(runtime): network error: dropped by socket
[    14976800us]  WARN(runtime): network error: dropped by socket
[    15384457us]  WARN(runtime): network error: dropped by socket
[    15384982us]  WARN(runtime): network error: dropped by socket
[    16200400us]  WARN(runtime): network error: dropped by socket
[    16200926us]  WARN(runtime): network error: dropped by socket
[    17836539us]  WARN(runtime): network error: dropped by socket
[    17837065us]  WARN(runtime): network error: dropped by socket
[    21112579us]  WARN(runtime): network error: dropped by socket
[    21113034us]  WARN(runtime): network error: dropped by socket
[    27657740us]  WARN(runtime): network error: dropped by socket
[    27658256us]  WARN(runtime): network error: dropped by socket
[    40761641us]  WARN(runtime): network error: dropped by socket
[    40762184us]  WARN(runtime): network error: dropped by socket
[   119294807us]  INFO(runtime::session): resetting RTIO
[   122296111us]  WARN(runtime): network error: dropped by socket
[   125296090us]  WARN(runtime): network error: dropped by socket
[   128296153us]  WARN(runtime): network error: dropped by socket
[   131296140us]  WARN(runtime): network error: dropped by socket
[   134296006us]  WARN(runtime): network error: dropped by socket
[   137296013us]  WARN(runtime): network error: dropped by socket
[   140295960us]  WARN(runtime): network error: dropped by socket
[   143296176us]  WARN(runtime): network error: dropped by socket
[   146296167us]  WARN(runtime): network error: dropped by socket
[   149296168us]  WARN(runtime): network error: dropped by socket
[   152296206us]  WARN(runtime): network error: dropped by socket
[    40761641us]  WARN(runtime): network error: dropped by socket
[    40762184us]  WARN(runtime): network error: dropped by socket

No LED flashing observed until after "[ 119294807us] INFO(runtime::session): resetting RTIO". Then LED flashes continuously.

@jbqubit
Copy link
Contributor Author

jbqubit commented Jan 25, 2017

Repeat of previous test. This time with wireshark logging following reboot.

$ flterm /dev/ttyUSB3

MiSoC BIOS
(c) Copyright 2007-2016 M-Labs Limited
Built Jan 26 2017 06:31:03

BIOS CRC passed (a7d1dadd)
Initializing SDRAM...
Write leveling: 15* 16* 13  13   8   7   3   6  completed
Read bitslip: 7 6 5 4 3 
Read delays: 7:01-11  6:00-12  5:03-13  4:03-14  3:10-21  2:00-06  1:00-09  0:00-09  completed
Memtest OK
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Loading 360488 bytes from flash...
Executing booted program.
[         112us]  INFO(runtime): ARTIQ runtime starting...
[         322us]  INFO(runtime): software version false
[         540us]  INFO(runtime): gateware version 3.0.dev+725.g591268a
[        3965us]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1003078us]  INFO(runtime): continuing boot
[     1003586us]  INFO(runtime::rtio_mgt): startup RTIO clock: internal
[     1011062us]  INFO(runtime::session): running startup kernel
[     1019480us]  INFO(runtime::session): resetting RTIO
[     1020115us]  INFO(runtime::session): startup kernel finished
[     1026955us]  INFO(runtime::session): accepting network sessions
[     1027195us]  INFO(runtime::session): no connection, starting idle kernel
[     1030875us]  INFO(runtime::session): no idle kernel found
[     9193487us]  INFO(runtime::session): new connection from 192.168.1.68:34636
[     9193777us]  INFO(runtime::session): terminating running kernel
[     9399804us]  WARN(runtime): network error: dropped by socket
[     9400265us]  WARN(runtime): network error: dropped by socket
[     9807910us]  WARN(runtime): network error: dropped by socket
[     9808368us]  WARN(runtime): network error: dropped by socket
[    10623741us]  WARN(runtime): network error: dropped by socket
[    10624200us]  WARN(runtime): network error: dropped by socket
[    12259960us]  WARN(runtime): network error: dropped by socket
[    12260421us]  WARN(runtime): network error: dropped by socket
[    15531924us]  WARN(runtime): network error: dropped by socket
[    15532332us]  WARN(runtime): network error: dropped by socket
[    22069250us]  WARN(runtime): network error: dropped by socket
[    22069754us]  WARN(runtime): network error: dropped by socket
[    35157417us]  WARN(runtime): network error: dropped by socket
[    35157897us]  WARN(runtime): network error: dropped by socket
[    61306993us]  INFO(runtime::session): resetting RTIO
[    64303286us]  WARN(runtime): network error: dropped by socket
[    67303337us]  WARN(runtime): network error: dropped by socket
[    70303368us]  WARN(runtime): network error: dropped by socket
[    73303369us]  WARN(runtime): network error: dropped by socket
[    76303365us]  WARN(runtime): network error: dropped by socket
[    79303283us]  WARN(runtime): network error: dropped by socket
[    82303251us]  WARN(runtime): network error: dropped by socket
[    85303354us]  WARN(runtime): network error: dropped by socket
[    88303305us]  WARN(runtime): network error: dropped by socket
[    91303258us]  WARN(runtime): network error: dropped by socket
[    94303446us]  WARN(runtime): network error: dropped by socket
[    97303527us]  WARN(runtime): network error: dropped by socket
[   100303320us]  WARN(runtime): network error: dropped by socket
[   103303497us]  WARN(runtime): network error: dropped by socket
[   106303527us]  WARN(runtime): network error: dropped by socket
[   109303540us]  WARN(runtime): network error: dropped by socket
[   112303574us]  WARN(runtime): network error: dropped by socket
[   115303547us]  WARN(runtime): network error: dropped by socket
[   118303517us]  WARN(runtime): network error: dropped by socket
[   121303517us]  WARN(runtime): network error: dropped by socket
[   124303481us]  WARN(runtime): network error: dropped by socket
[   127303452us]  WARN(runtime): network error: dropped by socket
[   130303462us]  WARN(runtime): network error: dropped by socket
[   133303651us]  WARN(runtime): network error: dropped by socket
[   136303696us]  WARN(runtime): network error: dropped by socket
[   139303717us]  WARN(runtime): network error: dropped by socket
[   142303650us]  WARN(runtime): network error: dropped by socket
[   145303669us]  WARN(runtime): network error: dropped by socket
[   148303752us]  WARN(runtime): network error: dropped by socket
[   151303543us]  WARN(runtime): network error: dropped by socket
[   154303758us]  WARN(runtime): network error: dropped by socket
[   157303778us]  WARN(runtime): network error: dropped by socket
[   160303772us]  WARN(runtime): network error: dropped by socket
[   163303819us]  WARN(runtime): network error: dropped by socket

$ artiq_run repository/coredevice_examples/simple/blink_forever.py

After a delay flashing commences.

wireshark_log_20160125a.pcapng.gz

@jordens
Copy link
Member

jordens commented Jan 25, 2017

Could you please attach the dump data?

@jbqubit
Copy link
Contributor Author

jbqubit commented Jan 26, 2017

$ artiq_coreanalyzer --write-dump bigdump.txt

bigdump.txt

Sorry, something went wrong.

@whitequark
Copy link
Contributor

@jbqubit Can you recheck with the latest master?

Sorry, something went wrong.

@jordens jordens modified the milestone: 3.0 Jan 27, 2017
@jbqubit
Copy link
Contributor Author

jbqubit commented Jan 27, 2017

Using latest build.

$ conda list artiq
# packages in environment at /home/britton/anaconda3/envs/artiq3test2:
#
artiq                     3.0.dev         py_749+git55ce40e    m-labs/label/dev
artiq-kc705-nist_clock    3.0.dev         py_749+git55ce40e    m-labs/label/dev
llvmlite-artiq            0.12.0.dev              py35_30    m-labs/label/main

$ artiq_flash -t kc705 -m nist_clock

$ ping 192.168.1.71
PING 192.168.1.71 (192.168.1.71) 56(84) bytes of data.
64 bytes from 192.168.1.71: icmp_seq=1 ttl=64 time=0.241 ms
64 bytes from 192.168.1.71: icmp_seq=2 ttl=64 time=0.271 ms

$ artiq_run repository/coredevice_examples/simple/blink_forever.py

LED starts flashing after delay << 1 sec. After a while of LED-watching I terminated with ctrl-C.

MiSoC BIOS
(c) Copyright 2007-2016 M-Labs Limited
Built Jan 27 2017 22:47:20

BIOS CRC passed (c7dbd19f)
Initializing SDRAM...
Write leveling: 15* 16* 13  13   8   7   3   6  completed
Read bitslip: 7 6 5 4 3 
Read delays: 7:01-11  6:00-11  5:03-12  4:03-14  3:10-21  2:00-06  1:00-08  0:00-09  completed
Memtest OK
Booting from serial...
Press Q or ESC to abort boot completely.
sL5DdSMmkekro
Timeout
Booting from flash...
Loading 367960 bytes from flash...
Executing booted program.
[         112us]  INFO(runtime): ARTIQ runtime starting...
[         320us]  INFO(runtime): software version 3.0.dev+749.g55ce40e
[         540us]  INFO(runtime): gateware version 3.0.dev+749.g55ce40e
[        5267us]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     1005079us]  INFO(runtime): continuing boot
[     1005687us]  INFO(runtime): using MAC address 00-0a-35-03-1e-53
[     1006113us]  INFO(runtime): using IP address 192.168.1.71
[     1008575us]  INFO(runtime::rtio_mgt): startup RTIO clock: internal
[     1015226us]  WARN(runtime): network error: truncated packet
[     1033629us]  INFO(runtime::session): accepting network sessions
[     1033984us]  INFO(runtime::session): running startup kernel
[     1042639us]  INFO(runtime::session): resetting RTIO
[     1043380us]  INFO(runtime::session): startup kernel finished
[     1043698us]  INFO(runtime::session): no connection, starting idle kernel
[     1050618us]  INFO(runtime::session): no idle kernel found
[    17476484us]  INFO(runtime::session): new connection from 192.168.1.68:47335
[    17476746us]  INFO(runtime::session): terminating running kernel
[    17746180us]  WARN(runtime): network error: dropped by socket
[    18279111us]  INFO(runtime::session): resetting RTIO
[    21275395us]  WARN(runtime): network error: dropped by socket
[    24275405us]  WARN(runtime): network error: dropped by socket
[    25771921us]  INFO(runtime::session): no connection, starting idle kernel
[    25772438us]  INFO(runtime::session): no idle kernel found

wireshark_log_20160127a.pcapng.gz
bigdump.txt

Looking better.

@whitequark
Copy link
Contributor

whitequark commented Jan 30, 2017

It now takes 200ms total to start a 8KB kernel. There is a number of ways to improve on this number, e.g.:

  1. add more liteeth buffers (which take up space in FPGA SRAM, and don't scale anyhow);
  2. limit receive window to the amount of liteeth buffers (a hack which breaks down under contention and with concurrent connections);
  3. a proper solution, using an ISR to copy packets from the liteeth buffer into RAM and then returning to the main loop.

I am going to implement 3) in the normal course of moving the runtime further towards Rust, as I will implement the necessary support code (an atomic ringbuffer) anyway. For now, I understand the kernel load time is acceptable. Please test and reopen if it is not for your experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants