Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Crypto::Bcrypt performance #3880

Merged

Conversation

ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Jan 12, 2017

The current Crypto::BCrypt implementation is roughly ~68% slower than the Ruby extension.

Changing the HEAP allocated Array for the stack allocated StaticArray and flattening P immediately reduces the difference to ~32% slower. Using raw pointers closes the gap further by going down to ~19% slower. Raw pointers are unsafe, but the algorithm has control over the pointer's range, and it never goes out of bounds. It's still not on par with the Ruby implementation (that uses C) but it's getting closer.

Running this benchmark, generating 10 hashes at cost 12 got a 71% 41% performance increase, down from 4.15s to 2.94s (vs 2.46s for Ruby).

refs #1309

0x01c36ae4, 0xd6ebe1f9, 0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6,
],
private S = StaticArray[
Copy link

@lribeiro lribeiro Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not UInt32.static_array(...) instead?
Won't it spare us all the _u32?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I wasn't aware of this initializer. I wish Int[1, 2, 3] would return a StaticArray instead of an Array...

@kostya
Copy link
Contributor

kostya commented Jan 16, 2017

replace f with, gives also 5% performance:

  @[AlwaysInline]
  private def f(x)
    d = x.to_u8
    c = (x >> 8).to_u8
    b = (x >> 16).to_u8
    a = (x >> 24).to_u8
    ((@s.to_unsafe[a] + @s.to_unsafe[256 + b]) ^ @s.to_unsafe[512 + c]) + @s.to_unsafe[768 + d]
  end

@ysbaddaden ysbaddaden force-pushed the enhancement-bcrypt-performance branch from 8e36580 to 8eef6af Compare January 17, 2017 10:31
Optimizes the Blowfish and Bcrypt algorithms by using stack
allocated objects (StaticArray) instead of a HEAP allocation
(Array), and raw pointer accesses.

The implementation is still not on par with alternative C
implementations (e.g. as used by the Ruby extension).
@ysbaddaden ysbaddaden force-pushed the enhancement-bcrypt-performance branch from 8eef6af to 535c50f Compare January 17, 2017 11:17
@ysbaddaden
Copy link
Contributor Author

@kostya I didn't notice an improvement (above benchmark, LLVM 3.9.1).

I got a 6% improvement, by memoizing pointers to @s sections, instead of repeating additions in f which is costly, and keeping & 0xFF_u32 which appears to be faster than casting with to_u8 (at least on Linux x86_64).

It's now down to 2.81s vs 2.46s (ruby).

@mverzilli mverzilli merged commit 42e2a52 into crystal-lang:master Jan 20, 2017
@mverzilli
Copy link

Excellent as usual, Julien! 🙇

Thank you so much, this is an amazing contribution 💯

@ysbaddaden ysbaddaden deleted the enhancement-bcrypt-performance branch January 20, 2017 16:17
@bcardiff bcardiff added this to the Next milestone Jan 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants