Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Use Glasgows can be interrupted by firmware reload #241

Closed
attie opened this issue Dec 5, 2020 · 13 comments · Fixed by #362
Closed

In-Use Glasgows can be interrupted by firmware reload #241

attie opened this issue Dec 5, 2020 · 13 comments · Fixed by #362
Labels
software Component: software

Comments

@attie
Copy link
Member

attie commented Dec 5, 2020

I had previously started a Glasgow collecting data using the SCD30 applet.

I've just connected a second Glasgow to my system and ran glasgow list (to get the serial for use with --serial), and noticed that the SCD30 Glasgow was interrupted. I also think I saw the following line twice (once for each board, I've lost the output unfortunately).

I: g.device.hardware: found revC1 device with API level 0 (supported API level is 1)

It seems that running glasgow list reloaded the firmware on the currently busy board, interrupting the applet, and showing the errors below (the last a number of times).

I don't know if we can "lock" a device, or determine if it is already use, but this may be worth investigating.

The API level message was introduced in ab21295, and may or may not be related.

I'll look into this further when I have time.


Traceback (most recent call last):
  File "/home/attie/proj_local/glasgow/venv/bin/glasgow", line 11, in <module>
    load_entry_point('glasgow', 'console_scripts', 'glasgow')()
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/cli.py", line 830, in main

  File "/home/attie/.bin/python3.8.2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/cli.py", line 639, in _main
    await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/cli.py", line 600, in run_applet
    if args.action == "run":
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/sensor/scd30/__init__.py", line 290, in interact
    await asyncio.wait_for(report(), meas_interval * 3)
  File "/home/attie/.bin/python3.8.2/lib/python3.8/asyncio/tasks.py", line 483, in wait_for
    return fut.result()
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/sensor/scd30/__init__.py", line 283, in report
    while not await scd30.is_data_ready():
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/sensor/scd30/__init__.py", line 96, in is_data_ready
    ready, = await self._read(CMD_DATA_READY, ">H")
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/sensor/scd30/__init__.py", line 81, in _read
    return struct.unpack(format, await self._read_raw(addr, struct.calcsize(format)))
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/sensor/scd30/__init__.py", line 54, in _read_raw
    acked = await self.lower.write(self.i2c_addr, struct.pack(">H", addr), stop=True)
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/interface/i2c_initiator/__init__.py", line 182, in write
    acked = len(data) - unacked
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/applet/interface/i2c_initiator/__init__.py", line 163, in _data_read

  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/access/direct/demultiplexer.py", line 225, in read
    async def read(self, length=None, *, flush=True):
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/access/direct/demultiplexer.py", line 357, in flush
    if self._out_buffer:
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/support/task_queue.py", line 73, in wait_all
    async def wait_all(self):
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/support/task_queue.py", line 55, in poll
    """
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/access/direct/demultiplexer.py", line 288, in _out_task

  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/device/hardware.py", line 245, in bulk_write
    else:
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/device/hardware.py", line 204, in _do_transfer
    def close(self):
  File "/home/attie/proj_local/glasgow/venv/lib/python3.8/site-packages/usb1/__init__.py", line 822, in submit
    raiseUSBError(result)
  File "/home/attie/proj_local/glasgow/venv/lib/python3.8/site-packages/usb1/__init__.py", line 125, in raiseUSBError
    raise __STATUS_TO_EXCEPTION_DICT.get(value, __USBError)(value)
usb1.USBErrorNoDevice: LIBUSB_ERROR_NO_DEVICE [-4]
E: asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f98fa2d97f0>
E: asyncio: Task exception was never retrieved
future: <Task finished name='Task-2297363' coro=<DirectDemultiplexerInterface._in_task() done, defined at /home/attie/proj_local/glasgow/glasgow/software/glasgow/access/direct/demultiplexer.py:209> exception=Gla
sgowDeviceError('transfer error: LIBUSB_TRANSFER_ERROR')>
Traceback (most recent call last):
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/access/direct/demultiplexer.py", line 217, in _in_task
    await self._in_pushback.wait()
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/device/hardware.py", line 236, in bulk_read
    endpoint_dir = "OUT"
  File "/home/attie/proj_local/glasgow/glasgow/software/glasgow/device/hardware.py", line 206, in _do_transfer
    self.usb_handle.close()
glasgow.device.GlasgowDeviceError: transfer error: LIBUSB_TRANSFER_ERROR
@whitequark whitequark added the software Component: software label Dec 5, 2020
@whitequark
Copy link
Member

I don't know if we can "lock" a device, or determine if it is already use, but this may be worth investigating.

We can, by claiming an interface. But... there's a chicken-and-egg problem here: the only way we can introduce this interface is by increasing the API level, which means that claiming an interface would require reloading the firmware, which means this very interruption.

@whitequark
Copy link
Member

Per IRC discussion, it's an edge case that we probably can't do much with.

@whitequark
Copy link
Member

I actually think we can introduce this feature without significant disruption. But it would need to be carefully considered how.

@marcan
Copy link
Member

marcan commented Dec 27, 2020

Can't we just make sure we don't reload firmware on Glasgows already exposing this interface? Sure, ones running old firmware would be reloaded... but that sounds like a problem nobody will ever actually care about. Once your Glasgows are initialized with current firmware it no longer matters.

@whitequark
Copy link
Member

whitequark commented Dec 27, 2020

Can't we just make sure we don't reload firmware on Glasgows already exposing this interface?

That's what API levels are already for! And we still have the problem, because people keep running old firmware and not reflashing it, because the firmware is transparently reloaded to SRAM on every invocation.

Essentially, the cause of this issue is the same assumption you made: I also thought this would be "a problem nobody will ever actually care about", but that turned out to not be the case.

@marcan
Copy link
Member

marcan commented Dec 27, 2020

But once you're running an applet with up to date software, like OP, it'll already be running the latest firmware, right? If this happened to them it's because their existing run wasn't using an interface, otherwise we would just try to claim it, fail, and not touch that device any more.

@whitequark
Copy link
Member

wasn't using an interface, otherwise we would just try to claim it, fail, and not touch that device any more.

The firmware is reloaded before any interfaces are claimed.

@marcan
Copy link
Member

marcan commented Dec 27, 2020

wasn't using an interface, otherwise we would just try to claim it, fail, and not touch that device any more.

The firmware is reloaded before any interfaces are claimed.

Then it sounds like an easy fix to claim the interface first, if present, as a mutex, before doing that.

@whitequark
Copy link
Member

Well... yes, that's what I suggested above? Except we don't have an interface that can be used for this purpose yet.

@marcan
Copy link
Member

marcan commented Dec 27, 2020

Sure... I was just confused as to how this was an unfixable problem. Like, just doing that means going from "any concurrent glasgow invocations fail" to "glasgow invocations concurrent with out of date ones fail" which sounds like an obvious 99% fix.

@whitequark
Copy link
Member

Yep, I was also confused when I closed this issue, which is why I reopened it!

@marcan
Copy link
Member

marcan commented Dec 27, 2020

Ok, we're in violent agreement then :)

@whitequark
Copy link
Member

whitequark commented Dec 29, 2020

Actually, I think I came up with a way that is somewhat simpler and works with every firmware version we have: just try to claim every interface that is already exposed before reflashing, then release. (AIUI, on Windows libusb requires exclusive access in first place, so this is never an issue, and on Linux at least one of these claims must fail if the device is in use.)

whitequark added a commit to whitequark/glasgow that referenced this issue Jul 24, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Jul 24, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Jul 24, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Jul 29, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Jul 31, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Aug 1, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
whitequark added a commit to whitequark/glasgow that referenced this issue Aug 5, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes GlasgowEmbedded#241.
github-merge-queue bot pushed a commit that referenced this issue Aug 5, 2023
On Linux multiple userspace applications can send control requests
to a device without being aware of each other. Claim every interface
to ensure mutual exclusion with any other application.

On other OSes device access is already exclusive.

Fixes #241.
@attie attie closed this as completed in #362 Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
software Component: software
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants