Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Shape Conversion Operators Round 2 #464

Closed
awygle opened this issue Aug 3, 2020 · 21 comments
Closed

[RFC] Shape Conversion Operators Round 2 #464

awygle opened this issue Aug 3, 2020 · 21 comments
Labels

Comments

@awygle
Copy link
Contributor

awygle commented Aug 3, 2020

This RFC is a continuation of #381 .

Currently, in order to change the shape of a signal, the following workarounds are needed:

  • for truncation, use v[:new_width]
  • for extension, use a no-op bitwise operation, like v | C(0, new_width).

This code is confusing when it is encountered among a bunch of arithmetic operations, and it's more of a side effect than an explicit request to change shape. This makes it fragile, because neither of these operations is guaranteed to return a value with width new_width.

There are three proposals on the table currently for improving this situation.

Proposal 1: Add three new operators, each of which takes a width

  • v.zero_extend(new_width) that works only on unsigned values, requires new_width >= len(v), and returns unsigned values,
  • v.sign_extend(new_width) that works only on signed values, requires new_width >= len(v), and returns signed values,
  • v.truncate(new_width) that works on values of either signedness, requires new_width <= len(v), returns a value of the same signedness as the input, and checks (in simulation and formal) that the truncated bits were all identical to 0 (for unsigned values) or the new MSB (for signed values).

Proposal 2: Add a single general-purpose shape conversion operator, v.as(new_shape), for truncation, zero/sign extension, and type conversion. The new operator would extend v if new_shape.width > len(v), truncate v if new_shape.width < len(v), and return a value with shape new_shape. The semantics of this operator would be to preserve the mathematical value (modulo output size), as described in this comment.

Proposal 3: Add two new operators, each of which takes a Shape

  • v.extend(new_shape) that works on signed or unsigned values and requires new_shape.width >= len(v)
  • v.truncate(new_shape) that works on values of either signedness and requires new_shape.width <= len(v)

Both of these operators would also preserve the mathematical value modulo output size and return a value with shape new_shape.

@awygle
Copy link
Contributor Author

awygle commented Aug 3, 2020

Switching hats, I'm personally strongly in favor of Proposal 1. My mental model is that Values are just wires, and it's the operations which are signed or unsigned (think of the difference between atomic_load/atomic_store and volatile).

@whitequark
Copy link
Member

cc @programmerjake -- it was your comment which had the insight about preserving mathematical value that led to a second round of this RFC, so I'd certainly like to know your opinion here as well

cc @Ravenslofty -- you had some good arguments against the previous round of the RFC, so I'd like to know what you think about this one

@Ravenslofty
Copy link
Contributor

I think proposal 1 is the best option.

The use of a shape as a target size in prop 2 and 3 makes me uncomfortable: consider somebody who knows that Signal(range(N)) will produce a Signal that can fit all values up to N. So, since as takes a shape, they can do x.as(range(48)) to have something that will fit the range of 0..47. Except somewhere in their code they have a bug that means x.as is returning something in the range of 48..63, and it's not necessarily obvious in this context that you can have a value out of bounds.

@whitequark
Copy link
Member

To add to @Ravenslofty's comment (without necessarily agreeing with it): although it might seem that the problem of .as(range(48)) is the same as the existing Signal(range(48)), which we evidently are all fine with, it is somewhat worse in case of conversion operators because only when discussing conversion operators we explicitly rely on preservation of mathematical value.

So in case of Signal we just say that range doesn't mean what one might naively assume and that's mostly fine, but in case of .as() we first talk about mathematical value and wraparound modulo, and then talk about range, which is much more counterintuitive in that context in particular.

@programmerjake
Copy link
Contributor

All of the stuff regarding preserving mathematical value looks good to me.

I disagree with proposal 1 having overflow checks for truncation, I may be somewhat biased by my experience with C, C++, Rust, Java, JavaScript, LLVM IR, D, and probably a few other programming languages that also wrap, but I think truncation should wrap modularly instead of cause an error.

If desired, there can be a checked_truncate or checked_as (inspired by Rust's checked_add/wrapping_add and similar functions) or similarly named function for failing on overflow instead of wrapping. If desired, the logical extreme naming could be used: have separate checked_truncate and
wrapping_truncate and no plain truncate. Similarly for checked/wrapping_as.

If truncation is changed to wrap when not using explicitly checked_* functions, then I'm fine with any of the 3 proposals.

Regarding casting to a shape and issues with range being misleading, a lint could be added for non-power-of-2 ranges and maybe for enums too.

@whitequark
Copy link
Member

I think truncation should wrap modularly instead of cause an error.

I agree (I missed this when reading the RFC text the first time). If nothing else, this is inconsistent with pretty much any other part of the language we have at the moment. Introducing checked_truncate or something like that later, if desired, seems good to me as well.

Regarding casting to a shape and issues with range being misleading, a lint could be added for non-power-of-2 ranges and maybe for enums too.

We discussed this on IRC. Although exceptions could be added, this seems like it would make writing generic (e.g. over a user-specified enum) code harder, which would be counterproductive.

@awygle
Copy link
Contributor Author

awygle commented Aug 3, 2020

I disagree with proposal 1 having overflow checks for truncation

I agree here, I forgot to mention it in my above comment but in general truncation removing the required number of bits, starting with the MSB, regardless of their value, seems to be the semantics we want. I would want this to be a lint.

@Ravenslofty
Copy link
Contributor

There... are no truncation overflow checks proposed? Aside from a guard against truncating to a wider value than what you started with.

As for range/enum casting being misleading, there seems to be a consensus in IRC that having some ranges/enums work but not others is going to lead to confusion. It's best to just not accept either.

@awygle
Copy link
Contributor Author

awygle commented Aug 3, 2020

@Ravenslofty programmerjake refers to this text:

and checks (in simulation and formal) that the truncated bits were all identical to 0 (for unsigned values) or the new MSB (for signed values).

which came along for the ride from the comment on #381 referenced in the RFC.

@programmerjake
Copy link
Contributor

Regarding casting to a shape and issues with range being misleading, a lint could be added for non-power-of-2 ranges and maybe for enums too.

We discussed this on IRC. Although exceptions could be added, this seems like it would make writing generic (e.g. over a user-specified enum) code harder, which would be counterproductive.

I was thinking it would lint against all enums, not just non-power-of-2 enums, though I can understand why linting against enums or ranges seems like a bad idea.

@whitequark
Copy link
Member

I was thinking it would lint against all enums, not just non-power-of-2 enums, though I can understand why linting against enums or ranges seems like a bad idea.

Ah right, so I'm fine with linting against all enums, but linting against only non-power-of-2 ranges has the same hazard.

@emeb
Copy link

emeb commented Aug 4, 2020

I like proposal #2. Proposals #1 or #3 seem fine, with a caveat in that I don't like the use of the term "truncate" to indicate chopping bits off the MS end of a signal - to me truncation implies dropping LS bits (as in mathematical truncation which reduces precision) and I suspect this could cause some confusion. Proposal #2 seems better to me because it has the same effect as 1 or 3 but avoids the use of the confusing term.

@whitequark
Copy link
Member

The use of "truncate" in the sense of cutting off MSBs has some precedent, e.g. see LLVM LangRef.

@emeb
Copy link

emeb commented Aug 4, 2020

The use of "truncate" in the sense of cutting off MSBs has some precedent, e.g. see LLVM LangRef.

Fair. Coming from a primarily hardware / DSP background my definition of truncation is likely somewhat narrower in scope than what's used in the wider software engineering world. Regardless, I'm glad to see these capabilities considered.

@whitequark
Copy link
Member

Coming from a primarily hardware / DSP background my definition of truncation is likely somewhat narrower in scope than what's used in the wider software engineering world.

I think it's a fair objection! Since hardware and DSP are both something that I would expect people interested in nMigen to know, maybe truncate is not such a good name after all. But it's not clear to me yet.

@awygle
Copy link
Contributor Author

awygle commented Aug 4, 2020

I agree, that's valuable feedback - it never even occurred to me that A = Signal(32); B = A.truncate(16) could result in B containing the 16 most-significant bits of A.

@awygle
Copy link
Contributor Author

awygle commented Aug 4, 2020

I asked the Digital Design Discord I'm in this question:

i have a 32-bit signal A. i call a function called truncate on that signal to make a 16-bit signal B, thus: B = A.truncate(16). does B contain the 16 most significant bits of A, or the 16 least significant bits of of A?

The general consensus was that the intuitive answer was "the 16 least significant bits", but that truncate was not a good name for this function. Some suggested alternatives were "resize", "lsbs/msbs", and "slice_lsb/slice_msb". One interesting point that was brought up several times was that if there's a decimal involved, the intuition flips - for floating point or fixed point people expected B to contain the most significant bits of A.

@adamgreig
Copy link
Contributor

Looking at some DSP I wrote in nmigen, I actually use slicing more frequently to select the msbits of some signals than the lsbits, i.e. I'm doing msbits = x[16:] more than lsbits = x[:16]. If one of the reasons for the new methods is to avoid having to use slice syntax for resizing signals, perhaps it does make sense to introduce matching make-signals-smaller methods for the most/least significant bit options. Naming becomes twice as hard though and it seems like most people expect truncating to the least significant bits to be the more common operation. Using the slice notation doesn't bother me especially.

This is a problem with all three proposals in so far as they all make signals shorter by always discarding the msbits, which is maybe an argument against the "universal" x.as(shape) method which doesn't obviously permit an msbit/lsbit variant.

That aside I have a weak preference for 3 over 1, since we already store signedness in the signal shape and for example right shift is arithmetic for signed signals and logical for unsigned signals. Could we just take all the proposals as described, so we have x.zero_extend(16) that only works on unsigned x with length less than 16, x.extend(16) that is either signed or unsigned depending on x, and x.as(16) which additionally will shrink a larger-than-16 x? Then the user can decide what is most appropriate/explicit/expressive or otherwise most useful in context.

@DaKnig
Copy link

DaKnig commented Aug 6, 2020

one more thing to consider is casting negative numbers to unsigned shapes. a common trick in C to set all bits of an unsigned number is to assign -1 to it- and that is equivalent to assigning the max value of that type. I think we should have the same in nmigen - so Signal(unsigned(8)).eq(-1) would set all bits (continuing from IRC)

@whitequark
Copy link
Member

whitequark commented Oct 26, 2020

Something this RFC should take into account is the issue highlighted in #481, which we definitely should address as a part of any new set of shape conversion operators.

I believe that it is an argument in favor of Proposal 1: if we allow .sign_extend() without arguments to mean "extend the smallest amount of bits required to make the value signed", then it would do the same thing as .to_signed() proposed in #481, but without the associated confusion.

@whitequark
Copy link
Member

There has not been a consensus on this proposal, and due to the complexity and impact it would have to go through our new RFC process to be accepted. In addition, few people have expressed a need for this functionality, so perhaps it is just not particularly needed.

@whitequark whitequark closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants