Adds digits method to the Int type #5598

FrankKair · 2018-01-16T22:24:57Z

Hi, this is my first pull request for a programming language.
I implemented the digits method for the Int type, like Ruby did in its 2.4 release.

straight-shoota · 2018-01-16T23:01:35Z

Hey @FrankKair, thanks for participating in making Crystal better!
What is the use case for this method, why is it needed? "because Ruby has it" is unlikely going to convince somebody to include this in the standard library.

Besides the general question, the implementation using to_s is by far not optimal. It is conceptually simple, but realistically it is a very slow algorithm and unnecessarily allocates a string just to throw it away.

woodruffw · 2018-01-16T23:12:09Z

What is the use case for this method, why is it needed?

Personally, I've used Integer#digits for checksum and check-digit algorithms in Ruby. It also comes in handy for code golfing/exercises, although that's less of a real-world use 🙂

Aside from performance issues, there are some differences with Ruby's Integer#digits that should be addressed:

-123.digits raises an error in Ruby, while this returns [1, 2, 3]. IMO, it makes sense to preserve that behavior here (i.e., raise an error instead of silently trimming the negative sign).
123.digits returns [3, 2, 1] in Ruby, while the current implementation returns [1, 2, 3] (MSD order). I think MSD order makes more sense, but a more efficient implementation will probably be naturally LSD.

FrankKair · 2018-01-17T00:07:30Z

(First of all, thank you for the comments!)

Hey @straight-shoota, thanks for pointing that out, I probably should've been a little more specific and explain why I think it's a nice addition to the standard library.

So, as @woodruffw just said, it's interesting for checksum, digits manipulation and some algorithms (and exercises/coding challenges, why not).

I thought it was so nice when Ruby implemented this method that I wanted Crystal to have it too. Otherwise, people would have to type self.to_s.gsub('-', "").each_char.map(&.to_i).to_a themselves, and I think that's something the Int type could take care of.

And, as you just said, this is rather inefficient. So maybe that's a good thing, instead of letting people write inefficient code (as I just did), we could provide them a clean, performant API to work with.

FrankKair · 2018-01-17T00:12:21Z

I was trying to think of a better idea for this method and I've come across this Haskell implementation:

digs :: Integral x => x -> [x]
digs 0 = []
digs x = digs (x `div` 10) ++ [x `mod` 10]

And I also saw the Elixir implementation:

  def digits(integer, base \\ 10)
      when is_integer(integer) and is_integer(base) and base >= 2 do
    do_digits(integer, base, [])
  end

  defp do_digits(integer, base, acc) when abs(integer) < base, do: [integer | acc]

  defp do_digits(integer, base, acc),
    do: do_digits(div(integer, base), base, [rem(integer, base) | acc])

This kind of approach would be more efficient, I suppose.
What do you guys think?

@woodruffw I also think that the normal order makes more sense, because it's what most people expect.
As for the negative / error raising, I really don't know. At first I just thought it was an edge case I should deal with. Whatever you guys say/prefer, I still don't have a position on this one.

FrankKair · 2018-01-17T01:06:27Z

What about this?

def digits
  quotient = self.abs / 10
  remainder = self.abs % 10
  if quotient == 0
    return [remainder]
  end
  digits(quotient) << remainder
end

I'm using self.abs so the method still works with negative numbers.
But well, as I said previously, I still don't have a position on this one.

CPU wise, I tested the method with BigInts with 30k digits and the method was running smoothly.
Memory wise, I don't know, but I guess it's better than allocating strings, as @straight-shoota pointed out.

woodruffw · 2018-01-17T01:50:00Z

That implementation looks like a solid improvement to me. You might want to compare it to an iterative version, though -- my suspicion is that LLVM can optimize a while loop better than it can a non-tail-recursive function.

luislavena · 2018-01-17T14:30:52Z

Hello!

Thank you for your interest and contribution to Crystal!. If the reason to bring this to Crystal is because Ruby has it, then the implementation needs to be as close as possible to the original implementation.

This means that 123.digits on both Crystal and Ruby should have similar results, which this implementation does not produce them.

Also, as pointed out:

to_s will allocate a new String in memory
gsub will result in a new string allocated
map(...) converts each character into integer
to_a allocates a new array and copies each of the elements of the iteration in.

Doing a quick benchmark:

require "benchmark"

struct Int
  def digits
    self.to_s.gsub('-', "").each_char.map(&.to_i).to_a
  end
end

Benchmark.ips do |x|
  x.report "123.digits (PR)" { 123.digits }
  x.report "-123.digits (PR)" { -123.digits }
end

Results in the following:

 123.digits (PR)   2.44M ( 410.6ns) (± 6.28%)  163 B/op        fastest
-123.digits (PR)   1.45M (687.87ns) (± 5.10%)  289 B/op   1.68× slower

163 bytes allocated per operation. Worse case scenario, dealing with negative values, results in 289 bytes.

There are other differences:

Ruby's implementation accepts a different base other than 10
Your code uses to_s, which doesn't work when dealing with base 16 representation (fails to convert string back to integer)

A possible alternate implementation might look like this:

struct Int
  def alt_digits(base = 10)
    arr = [] of typeof(self)
    num = self

    while num != 0
      arr << num.remainder(base).abs
      num = num.tdiv(base)
    end

    arr
  end
end

Which shows the following on the benchmarks:

 123.digits (PR)   2.63M (379.52ns) (± 6.92%)  163 B/op   4.12× slower
123.digits (alt)  10.85M ( 92.15ns) (±13.23%)   48 B/op        fastest

Faster iterations and less allocations. However, the naive implementation shown above doesn't deal with negative numbers, so that requires to be addressed if the objective is provide an implementation that is similar to Ruby (and if that is desired).

Cheers.

FrankKair · 2018-01-17T16:07:20Z

Hi, @luislavena, thank you for the in-depth feedback!

I do believe that digits is a nice addition to the std lib regardless of Ruby (I just mentioned it because I thought people would remember that it's a nice function).

Having said that, I think that the method should return the numbers in order, because it's more natural. It's not intuitive to ask for digits and receive them from lsd to msd. At least for me 😅

The first implementation with to_s was naive, the second one (which I draw inspiration from the Haskell and Elixir snippets I posted) seems more mature.

Here is my benchmark:

require "benchmark"

Benchmark.ips do |x|
  x.report "digits_1" { 12312312312312312312.digits_1 }
  x.report "digits_2" { 12312312312312312312.digits_2 }
  x.report "digits_3" { 12312312312312312312.digits_3 }
end

struct Int
  def digits_1
    self.abs.to_s.each_char.map(&.to_i).to_a
  end

  def digits_2
    quotient = self.abs.tdiv(10)
    remainder = self.abs.remainder(10)
    if quotient == 0
      return [remainder]
    end
    quotient.digits_2 << remainder
  end

  def digits_3(base = 10)
    arr = [] of typeof(self)
    num = self

    while num != 0
      arr << num.remainder(base).abs
      num = num.tdiv(base)
    end

    arr
  end
end

digits_1   1.66M (604.05ns) (± 2.48%)  1.59× slower
digits_2   1.68M (594.63ns) (± 3.81%)  1.56× slower
digits_3   2.63M (380.09ns) (± 3.96%)       fastest

At least for me, I do prefer to return the digits in order, so I tried adding reverse to your implementation (just returning arr.reverse).

digits_1    1.6M (626.85ns) (± 8.30%)  1.05× slower
digits_2   1.68M (595.21ns) (± 3.99%)       fastest
digits_3   1.67M (598.71ns) (± 4.06%)  1.01× slower

That way, the numbers changed a bit.
(These numbers are getting pretty close to each other with these implementations).

What do you guys think should be the proper behaviour and which implementation should remain?

RX14 · 2018-01-17T16:16:44Z

How is this useful for checksums and check-digits? Besides, if those two (pretty rare) usecases are the only usecases is it worth having in the stdlib? I can't say i've ever wanted a method like this.

woodruffw · 2018-01-17T16:28:24Z

How is this useful for checksums and check-digits?

It's pretty useful, at least in my experience. One of the common check-digit strategies is "perform function f on every kth digit and function g on every other digit, sum the results, and compare to the last digit," and a digits method makes that a bit easier.

Of course, like you said, that's not a common use case. It's also (now that I think about it) a bit of an abuse of the Integer type, since "numbers" that require checksums and check-digits should really be represented as strings or discrete types. However, just like phone numbers, that won't stop people from using numeric types to store them.

On an unrelated note, if this gets added, I'm in agreement with @FrankKair that MSD order is preferable (and thereby break with Ruby's behavior in that aspect).

asterite · 2018-01-17T16:46:56Z

All of the use cases you mention don't need to allocate memory. Just keep diving by ten and you get the digits. If no one can provide a real (as in "here's a real project I used it in") I don't think this should be go in the std.

RX14 · 2018-01-17T16:47:40Z

At the end of the day, you're still allocating an array for this method which makes calculating a checksum suboptimal. Perhaps it'd be better to put these checksum calculations inside the shards that require them.

If we accept this though, it should accept a base.

FrankKair · 2018-01-17T20:18:56Z

I refactored the code to this, following @RX14's idea of accepting a base:

def digits(base = 10)
  quotient = self.abs.tdiv(base)
  remainder = self.abs.remainder(base)
  if quotient == 0
    return [remainder]
  end
  quotient.digits << remainder
end

This is my first PR in a programming language, and even if it doesn't get added to the std lib, I'm quite grateful and I've learned a lot. It was very nice interacting with much more knowledgeable people, receiving feedback and discussing aspects that I didn't even consider in the first place.

Well, I still think it's nice and useful (and of course it would be awesome to contribute to the project!), but ultimately this is up to you all.

Cheers! 💎

lbguilherme · 2018-01-17T22:32:03Z

At least for me, I do prefer to return the digits in order, so I tried adding reverse to your implementation (just returning arr.reverse).
digits_1    1.6M (626.85ns) (± 8.30%)  1.05× slower
digits_2   1.68M (595.21ns) (± 3.99%)       fastest
digits_3   1.67M (598.71ns) (± 4.06%)  1.01× slower

Actually calling arr.reverse will copy the array then reverse the copy and return it. Better to reverse in-place:

  def digits_3(base = 10)
    arr = [] of typeof(self)
    num = self

    while num != 0
      arr << num.remainder(base).abs
      num = num.tdiv(base)
    end

    arr.reverse!

    arr
  end

This leads to:

digits_1   1.65M (606.38ns) (±11.00%)  1.43× slower
digits_2   1.68M (593.69ns) (± 4.77%)  1.40× slower
digits_3   2.37M (422.78ns) (±11.48%)       fastest

I can go one step further by pre-allocating the array capacity to the right size before everything to avoid dynamic resizing:

  def digits_4(base = 10)
    arr = Array(typeof(self)).new(Math.log10(self).ceil.to_i)
    num = self

    while num != 0
      arr << num.remainder(base).abs
      num = num.tdiv(base)
    end

    arr.reverse!

    arr
  end

Benchmark:

digits_1   1.71M (585.69ns) (± 4.10%)  2.00× slower
digits_2   1.55M (645.22ns) (±12.82%)  2.20× slower
digits_3   2.49M (402.04ns) (± 7.79%)  1.37× slower
digits_4   3.41M (293.01ns) (± 6.93%)       fastest

About the topic of whether or not it should be accepted into the standard library, I also question where should method could be useful outside some code golf scenario. And if it is indeed useful, its probably better to have an each_digit method instead.

straight-shoota · 2018-01-17T22:55:28Z

@FrankKair If you like to proceed contributing to the language, you might take a look at issues suited for newcomers.

FrankKair · 2018-01-20T16:00:48Z

Think that way, code golfing such as Project Euler is nice to provide comparison and benchmarks between languages. That was my first contact with Crystal, because I wanted to see how faster it could be than the other languages I used so far (such as Ruby, Python, Swift, Elixir...). I have a friend who also had his first contact with the language by trying to test its raw power.

I may be wrong, but I don't think most people start experimenting with a language trying to build web frameworks or complicated CLIs, but rather trying to compare its features with other languages previously used. So I still think it's a good thing to have a nice performant digits method on the std lib, such as this one because then people wouldn't have the bad idea I had in the beginning of this pull request (using to_s and the like), and thus they would be happy about the performance of the language.

On the Elixir digits pull request, the author of the PR said:

"While this is a pretty basic function, the way that 9/10 programmers tend to implement it is very inefficient and error prone, and it is in common usage. Thus, I would suggest that having a proper and efficient implementation in the standard lib is desirable."

And then someone else said:

"I like this! Having used the (non-standard) Haskell library Data.Digits for different tasks both yesterday and today, I can really see the benefit."

Not only that, but some languages implemented this method in a questionable way, such as Ruby. The to_s.each_char.map method is faster the std lib's digits, which is really counter intuitive. I posted a question about this on SO: Ruby digits performance. And here we seem to have a nice implementation.

So yeah, I still think it's a nice addition to the std lib.

asterite · 2018-01-20T16:40:03Z

I guess we can add this, it does no harm. It needs to be documented, and digits should probably always be an Int32 array. Plus we need to think about what to do with negative numbers.

Sija · 2018-01-20T16:44:33Z

I guess we can add this, it does no harm. It needs to be documented, and digits should probably always be an Int32 array.

Why not UInt32 or even UInt8?

Plus we need to think about what to do with negative numbers.

From my understanding -123 is still composed of the same digits as 123, what's different is the #sign.

src/int.cr

FrankKair · 2018-01-20T19:16:41Z

@asterite The method is always returning Array(Int32), which is fine by me. Do you think it's better to put this on the method's signature?

"From my understanding -123 is still composed of the same digits as 123"
I agree with that one. At least for me, it makes sense to return 123.

src/int.cr

asterite · 2018-01-20T20:47:47Z

Digits only go from 0 to 9. Int32 is the most common int type. Digits are unrelated to the number. At least that's my thinking. For example, multiply all digits then multiply by some other number, with Int8 chance of overflow. But maybe returning the same type is ok.

I wish we had a universal integer number type, it's so messy with so many integer types.

spec/std/int_spec.cr

src/int.cr

FrankKair · 2018-03-22T20:44:49Z

@asterite @Sija @RX14 Any other thoughts?

Sija · 2018-03-28T12:52:12Z

@FrankKair LGTM

RX14 · 2018-03-28T13:03:09Z

I still don't see much use in this method, if two other people want to approve it go ahead but I won't.

woodruffw

Not a member, but I'm 👍 on adding a digits method 😄

Edit: Reiterating my desire for this in 2019.

FrankKair · 2019-07-30T10:25:34Z

Two approvals, that's nice! 😊

FrankKair wants to merge 7 commits into crystal-lang:master from unknown repository

This pull request is open for over a year now, so I deleted my repo, thinking that it wouldn't go anywhere.

Will GitHub be able to merge it or should I fork the repo again? 😅

straight-shoota · 2019-07-30T10:52:14Z

It's only one approval from a core member, and as far as I see it, most others are rather skeptical of this. I don't see this as a useful addition and thus won't approve it.

In any case, this needs a rebase to be merged. It might work if you simply restore your fork and the branch.

straight-shoota · 2020-01-07T18:49:10Z

Closing this in favour of #8554
Should the RFC be approved we can decide to either reopen this PR or create a new one.

Thank you anyways, @FrankKair ❤️

FrankKair · 2020-01-09T16:54:38Z

Thank you very much @straight-shoota, feeling like a visionary now 😅

Adds digits method to the Int type

e4bd9b7

Refactors Int#digits method by using div and mod operators

886089a

Refactors digits for efficiency (with while loop and in-place reverse)

021367f

lbguilherme reviewed Jan 20, 2018

View reviewed changes

src/int.cr Outdated Show resolved Hide resolved

Uses math.log division to improve performance

e664aa9

Sija reviewed Jan 20, 2018

View reviewed changes

src/int.cr Outdated Show resolved Hide resolved

Adds space between Math.log operations for readability

848542e

Sija reviewed Jan 21, 2018

View reviewed changes

spec/std/int_spec.cr Show resolved Hide resolved

Adds UInt64 tests

b52162a

Sija reviewed Jan 21, 2018

View reviewed changes

src/int.cr Outdated Show resolved Hide resolved

Adds documentation to the digits method

609a441

woodruffw approved these changes Mar 28, 2018

View reviewed changes

straight-shoota mentioned this pull request Apr 2, 2019

Add "digits" method to Int #7611

Closed

jhass approved these changes Jul 30, 2019

View reviewed changes

asterite mentioned this pull request Dec 4, 2019

[RFC] Add Int#digits method #8554

Closed

straight-shoota closed this Jan 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds digits method to the Int type #5598

Adds digits method to the Int type #5598

FrankKair commented Jan 16, 2018

straight-shoota commented Jan 16, 2018 •

edited

Loading

woodruffw commented Jan 16, 2018

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

woodruffw commented Jan 17, 2018

luislavena commented Jan 17, 2018

FrankKair commented Jan 17, 2018 •

edited

Loading

RX14 commented Jan 17, 2018 •

edited

Loading

woodruffw commented Jan 17, 2018

asterite commented Jan 17, 2018

RX14 commented Jan 17, 2018

FrankKair commented Jan 17, 2018 •

edited

Loading

lbguilherme commented Jan 17, 2018

straight-shoota commented Jan 17, 2018

FrankKair commented Jan 20, 2018 •

edited

Loading

asterite commented Jan 20, 2018

Sija commented Jan 20, 2018

FrankKair commented Jan 20, 2018 •

edited

Loading

asterite commented Jan 20, 2018

FrankKair commented Mar 22, 2018

Sija commented Mar 28, 2018

RX14 commented Mar 28, 2018

woodruffw left a comment •

edited

Loading

FrankKair commented Jul 30, 2019

straight-shoota commented Jul 30, 2019

straight-shoota commented Jan 7, 2020

FrankKair commented Jan 9, 2020

Adds digits method to the Int type #5598

Adds digits method to the Int type #5598

Conversation

FrankKair commented Jan 16, 2018

straight-shoota commented Jan 16, 2018 • edited Loading

woodruffw commented Jan 16, 2018

FrankKair commented Jan 17, 2018 • edited Loading

FrankKair commented Jan 17, 2018 • edited Loading

FrankKair commented Jan 17, 2018 • edited Loading

woodruffw commented Jan 17, 2018

luislavena commented Jan 17, 2018

FrankKair commented Jan 17, 2018 • edited Loading

RX14 commented Jan 17, 2018 • edited Loading

woodruffw commented Jan 17, 2018

asterite commented Jan 17, 2018

RX14 commented Jan 17, 2018

FrankKair commented Jan 17, 2018 • edited Loading

lbguilherme commented Jan 17, 2018

straight-shoota commented Jan 17, 2018

FrankKair commented Jan 20, 2018 • edited Loading

asterite commented Jan 20, 2018

Sija commented Jan 20, 2018

FrankKair commented Jan 20, 2018 • edited Loading

asterite commented Jan 20, 2018

FrankKair commented Mar 22, 2018

Sija commented Mar 28, 2018

RX14 commented Mar 28, 2018

woodruffw left a comment • edited Loading

Choose a reason for hiding this comment

FrankKair commented Jul 30, 2019

straight-shoota commented Jul 30, 2019

straight-shoota commented Jan 7, 2020

FrankKair commented Jan 9, 2020

straight-shoota commented Jan 16, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

RX14 commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 17, 2018 •

edited

Loading

FrankKair commented Jan 20, 2018 •

edited

Loading

FrankKair commented Jan 20, 2018 •

edited

Loading

woodruffw left a comment •

edited

Loading