Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UUID. #2716

Closed
wants to merge 5 commits into from
Closed

Add UUID. #2716

wants to merge 5 commits into from

Conversation

mirek
Copy link
Contributor

@mirek mirek commented Jun 1, 2016

No description provided.

@fernandes
Copy link
Contributor

Hi @mirek tests are passing for std and compiler, but failed on formatting, you can run crystal tool format and commit the changes.. 👍


require "secure_random"

private def assert_hex_pair_at!(value : String, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it outside struct UUID?

@jhass
Copy link
Member

jhass commented Jun 2, 2016

Should we perhaps namespace this into SecureRandom, SecureRandom::UUID?


def initialize(bytes : Slice(UInt8))
raise ArgumentError.new "Invalid bytes length #{bytes.size}, expected 16." if bytes.size != 16
@data.to_unsafe.copy_from bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps validate the format?

@ysbaddaden
Copy link
Contributor

@jhass UUID happens to have a random version (v4) but all other versions are neither random or secure.

@jhass
Copy link
Member

jhass commented Jun 2, 2016

There are still constant fields, the highest two bits of clock_seq_high_and_reserved should always be 1 and 0, and the highest 4 bits of time_hi_and_version should be in range of the available versions (1-5).

@ysbaddaden This class only implements version 4 though

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jun 2, 2016

What about avoiding to copy data around in initialize, to_slice, etc? All the class needs to operate safely is 16 bytes, that is a Slice(UInt8). Maybe it could even make it a struct instead of class?

struct UUID
  def initialize(@bytes : Slice(UInt8))
    unless @bytes.size == 16
      raise ArgumentError.new("Invalid UUID: expected 16 bytes but got #{@bytes.size}")
    end
  end

  def to_slice
    @bytes
  end

  def to_unsafe
    @bytes.to_unsafe
  end
end

What about having a single initialize(bytes) method (as above) then adding a bunch of constructors, and a transparent parsing of String/Slice/UUID?

struct UUID
  def self.new(uuid)
    new parse(uuid)
  end

  private def self.parse(uuid : Slice(UInt8))
    uuid
  end

  private def self.parse(uuid : UUID)
    uuid.to_slice
  end

  private def self.parse(uuid : String)
    # parse hexdump string and return Slice(UInt8)
  end

  def ==(other)
    self == UUID.parse(other)
  end

  def ==(other : UUID)
    @bytes == other.to_slice
  end
end

Last, but not least: constructors with explicit names would be nicer than version numbers. For example following the uuidtools gem:

struct UUID
  def self.timestamp_create(timestamp = nil, mac = nil)
    # ...
  end

  def self.dec_create(timestamp = nil, mac = nil)
    # ...
  end

  def self.md5_create(namespace, object)
    bytes = digest("md5", namespace, object)
    new_rfc4122(bytes, 3)
  end

  def self.random_create
    bytes = SecureRandom.random_bytes(16)
    new_rfc4122(bytes, 4)
  end

  def self.sha1_create(namespace, object)
    bytes = digest("sha1", namespace, object)
    new_rfc4122(bytes, 5)
  end

  def self.new_rfc4122(uuid : Slice(UInt8), version : Int32)
    bytes[6] = (bytes[6] & 0x0f) | (version << 4)
    bytes[8] = (bytes[8] & 0x3f) | 0x80
    new bytes
  end

  private def self.digest(type, namespace, object)
    hash = OpenSSL::Digest.new(type)
    hash << parse(namespace)
    hash << parse(object)
    hash
  end
end

@jhass shall we enforce RFC4122, or shall we allow it to be loose and accept whatever 16 bytes (allowing any variant and version) and provide means to verify? For example:

struct UUID
  def rfc4122!
    raise ArgumentError.new("Invalid UUID variant, expected RFC4122") unless rfc4122?
    raise ArgumentError.new("Invalid UUID version: #{version}") unless 1 <= version <= 5
    self
  end

  def rfc4122?
    @bytes[8] & 0x80 == 0x80
  end

  def version
    @bytes[6] >> 4
  end
end

EDIT: I use UUID v4 and v5 a lot, I guess it shows :)

@jhass
Copy link
Member

jhass commented Jun 2, 2016

I'd say enforce it, it's a UUID class after all, if you're not working with UUIDs, you won't need its functionality anyhow.

@ysbaddaden
Copy link
Contributor

Well, technically a UUID is merely 128bits, and RFC4122 is a UUID variant among others:

0 x x : Reserved, NCS backward compatibility.
1 0 x : The variant specified in this document (RFC4122).
1 1 0 : Reserved, Microsoft Corporation backward compatibility
1 1 1 : Reserved for future definition.
https://tools.ietf.org/html/rfc4122#section-4.1.1

The special NIL UUID 00000000-0000-0000-0000-000000000000 is valid too:
https://tools.ietf.org/html/rfc4122#section-4.1.7

That being said, I agree that except for the NIL UUID —often used for UUID v3 and v5— the variants are merely for obscure backward compatibility, and the reserved variant or new versions will come any time soon. Even so, one could just override UUID#initialize to skip/change the checks until Crystal catches up, or we could have a valid? method (better).

@mirek
Copy link
Contributor Author

mirek commented Jun 2, 2016

I'm not sure about enforcing, it would (unnecessarily?) restrict the usage.

Currently, for example, I'm working on a system with custom made variants of UUIDs stored in postgres (which doesn't reject them) where couple of bytes are managed manually.

I think better way of dealing with it is to accept any UUID on constructors and add methods like #version to check specific version of RFC variant - with Version::Unknown as possible outcome.

Application logic can restrict to specific, known formats if needed.

We can also decorate it with shorter helpers like #v4? and/or #v4!.

@mirek
Copy link
Contributor Author

mirek commented Jun 2, 2016

@ysbaddaden it has been struct, not class from the beginning.

@mirek
Copy link
Contributor Author

mirek commented Jun 2, 2016

@ysbaddaden regarding copying data, let's not forget that the data is just 2x i64 - it's tiny. I think it's better to keep it close as static bytes in the struct and not having slice pointer which itself will take this space. It also means we have correct value semantics, just as if UUID was an Int.

If we want to support memory critical scenarios I think we could expose some static (mostly unsafe) functions like UUID.unsafe_rfc4122_version(ptr : UInt8*) : UUID::RFC4122::Version. This could be useful if you have millions of UUIDs in external buffer and you don't want to map/allocate UUID structs.

"Mostly unsafe" static functions because when you try to avoid this tiny overhead, then you also want to avoid creating slices. For example if you have millions of records in raw binary with uuid bytes somewhere and you want to filter records that are not in valid v4 for example.

Ok, so consolidating comments I'm going to:

  • move this helper which sits outside of struct (maybe to String#digits?(base: 16, offset: X, size: 2)?)
  • extract RFC4122 variant related functionality into UUID::RFC4122 and include that in UUID.
  • add #rfc4122_v4? and #rfc4122_v4! like methods for all versions
  • add (and use internally) static methods for checking rfc variant conformity and it's versions etc.
  • document all public methods/struct
  • make sure we don't use UUID::UUID, just ::UUID struct
  • add functions to extract components, like timestamp in some formats etc.
  • add designated constructors for different versions, if I'm not going to be able to grab something from stdlib (like MAC address, I don't know if I can or - if I can - which one should be default as there are many NICs in the system, some virtual etc, it's probably better to just expose :mac argument and maybe later we can add some sensible default)
  • don't include in prelude

@ysbaddaden
Copy link
Contributor

UUIDs are very simple, it would be great to have a simple and straightforward implementation and API.

A UUID::UUID type is highly unexpected (even if hidden it will show up in errors, inspect and documentation): let's keep a UUID type.

Since most use cases conform to RFC4122, there really is no need to extract related methods. We may choose to enforce the variant/version or not,, but this is unrelated. If we do enforce it by default, we may provide a mean to change this behavior, so one can experiment with non conforming variants and versions. If someday one releases a new variant that becomes popular we'll reconsider, but it's unlikely to happen soon.

One thing I'd really like is to be able to transparently pass a UUID, String or Slice anywhere I can pass a UUID (in the struct, that is) so it's simple to use, using whatever source of data (slice from PG, string from JSON, UUID from Crystal, ...)

Maybe #rfc4122?(version = nil) so we can programmatically check/enforce a version and not hardcode it?

I don't see a need for the version enum, especially with distinct constructors.

Yes, version 1 and version 2 UUIDs will be complicated because of the MAC. We may skip them or require to pass a value.

Yes, let's keep a StaticArray contained in the struct, instead of a pointer involving the GC.

@mirek
Copy link
Contributor Author

mirek commented Jun 2, 2016

@ysbaddaden yes, agree regarding struct UUID, will change that. We can still separate logic with something like:

struct UUID
  module RFC4122
    def foo
      "foo"
    end
  end
  extend RFC4122
end

puts UUID.foo

...but that's just cosmetics.

Regarding transparent usage agree as well. Is there some idiomatic way of implementing struct/class as "transparent" with other types?

@ysbaddaden
Copy link
Contributor

I didn't mean transparent with other types, it would be great, but I think it's up to usages to deal with that, otherwise we'd inject methods like to_uuid in different types that shouldn't care about it. I meant transparent to the UUID constructors.

@jkthorne
Copy link
Contributor

jkthorne commented Jun 7, 2016

Adding things to prelude includes them at start time correct?
Just curious because because other languages seem to include a lot at start time and it impacts memory and pauses in starting.
Is this something that should be available without requiring it?
This seems small but just asking the question.

@asterite
Copy link
Member

asterite commented Jun 7, 2016

@Javajax Good catch, uuid shouldn't be included in prelude

@jkthorne jkthorne mentioned this pull request Jun 16, 2016
@jkthorne
Copy link
Contributor

I was hoping to bump this PR so I added the commit for removing prelude here #2861

Is there anything else that needs to be updated?

@jkthorne jkthorne mentioned this pull request Jun 16, 2016
@RX14 RX14 mentioned this pull request Oct 15, 2016
@jkthorne
Copy link
Contributor

@mirek are you still working on this UUID branch?

@jkthorne jkthorne mentioned this pull request May 23, 2017
@mirek
Copy link
Contributor Author

mirek commented May 25, 2017

@wontruefree no, please feel free to pick it up if you want.

@mirek mirek closed this May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants