Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Gemini PostgreSQL JDBC driver to match the other projects #2626

Merged
merged 1 commit into from Mar 30, 2017

Conversation

zloster
Copy link
Contributor

@zloster zloster commented Mar 28, 2017

The other Java projects that are using JDBC are on postgresql-9.4.1208.jar. So it will be good and Gemini to use it as well.

@NateBrady23 NateBrady23 merged commit eb706f0 into TechEmpower:master Mar 30, 2017
@zloster zloster deleted the gemini-postgrejdbc branch April 5, 2017 07:44
@bhauer
Copy link
Contributor

bhauer commented Apr 12, 2017

@zloster Thanks for this! Something in the changes between 9.4.1200 and 9.4.1208 has had a significant impact on performance for our database tests as observed in Preview 3 for Round 14:

https://www.techempower.com/benchmarks/previews/round14/r14p2-vs-r14p3.html#gemini-postgres.db

It initially struck me as implausible that a micro version bump would yield such a performance boost, but looking at the Postgres JDBC driver changelog reveals several performance-oriented changes since micro version 1200:

https://jdbc.postgresql.org/documentation/changelog.html#version_9.4.1208

I am going to keep an eye on this since it still instinctively feels unlikely. I look forward to the next results to see if the observed performance in this new preview are reproduced/consistent.

@zloster
Copy link
Contributor Author

zloster commented Apr 13, 2017

@bhauer That's indeed an impressive boost.

It initially struck me as implausible that a micro version bump would yield such a performance boost, but looking at the Postgres JDBC

It's common misunderstanding that the PgJDBC releases are "micro". The recently changed radically their versioning to tackle this. The latest driver is version 42.0.0. See the discussion in the mailing list: message 1, message 2, message 3, message 4. I haven't added it because for the 42.0.0 release they've changed the way they do logging and depending on what logging framework is on the classpath there is a performance hit of about 10-15%. See the very recent messages on the mailing list. Soon 42.1.0 should be released which will address this.

The DB tests are taxing the DB drivers very hard - there are a lot of very simple queries passing. So even minor improvements in the statement processing path should be quite visible in the results. Gemini seems not to have any major bottlenecks so the results should be sensitive to improvements in the DB drivers. My PR #1937 mentioned 15% throughput increase and decrease in latencies but this was measured in a local Vagrant environment. In this environment the major bottleneck is the CPU emulation speed and it seems that if one could reliably improve a test result there with 10% than on real (or not that much restricted) hardware this will be multiplied. Seems like a new approach to the performance tuning :)

For example check the servlet JSON test between R13 and the R14 preview 3 (I can't give you direct link because the tests names anchors are missing from the HTML - https://www.techempower.com/benchmarks/previews/round14/r13-vs-r14p3.html#servlet.json). The PR is #2597 and there are several changes but if you try them one by one you will see that the Jackson library update from 2.3.0 to 2.7.8 is the major factor for improvement. The results go from 375,473 rps in R13 to 645,680 rps (or 586,897) in the R14 previews.

I am going to keep an eye on this since it still instinctively feels unlikely. I look forward to the next results to see if the observed performance in this new preview are reproduced/consistent.

There are three frameworks that are Java-based, have a mature code implementation and have tests for both MySQL and PostgreSQL - Servlet, Undertow and Dropwizard. You could measure their results with changed PgJDBC driver for comparison. Also note that with these frameworks the results with PostgreSQL DB are consistently better than the MySQL with very few exceptions (Servlet's Fortune test comes to my mind). And this wasn't the case with Gemini until this change.

Another observation - several rounds ago MySQL was beating PostgreSQL, but then PostgreSQL was 9.1 or 9.2. Currently TFB are using 9.3. I haven't checked the current MySQL version. But PostgreSQL "minor" releases are not minor at all - the current version is 9.6 and they decided the next version to be 10 to mitigate the perception of "minor". I'm thinking of preparing a PR for this. It seems to be easy because PostgreSQL is installed from their own PPA, and there is already 9.6.

And another thing (I promise to be the last): I have to make PR to revert the batch processing for the servlet-postgres - it seems to suffer from transaction deadlocks. The html report with the diff currently doesn't capture this problem - the update test is passing the lowest concurrency but gives 0 at higher concurrences. Which leads me to the question: How the current toolset implementation deals with test that created transaction deadlocks in the database? I think only PostgreSQL will be affected because transactions there can't be turned off.

Cheers and thank you for this great project

@bhauer
Copy link
Contributor

bhauer commented Apr 13, 2017

It's common misunderstanding that the PgJDBC releases are "micro". The recently changed radically their versioning to tackle this. The latest driver is version 42.0.0. See the discussion in the mailing list: message 1, message 2, message 3, message 4. I haven't added it because for the 42.0.0 release they've changed the way they do logging and depending on what logging framework is on the classpath there is a performance hit of about 10-15%. See the very recent messages on the mailing list. Soon 42.1.0 should be released which will address this.

Interesting. Thank you for the links and additional background. Indeed, having looked at the changelog, I realized that these were not "micro" versions in the traditional sense! :) Looking forward to 42.1.0.

The DB tests are taxing the DB drivers very hard - there are a lot of very simple queries passing. So even minor improvements in the statement processing path should be quite visible in the results. Gemini seems not to have any major bottlenecks so the results should be sensitive to improvements in the DB drivers. My PR #1937 mentioned 15% throughput increase and decrease in latencies but this was measured in a local Vagrant environment. In this environment the major bottleneck is the CPU emulation speed and it seems that if one could reliably improve a test result there with 10% than on real (or not that much restricted) hardware this will be multiplied. Seems like a new approach to the performance tuning :)

Yes! The database tests are intended to be a brutal exercise of the database drivers and ORM (where applicable). You're right that it stands to reason that tuning in the driver will show up as a significant gain in these tests, assuming that in so doing you don't reach a different bottleneck in the framework itself. Nevertheless, the delta in this particular test was a big surprise. The Postgres JDBC driver folks deserve a lot of credit for tuning the drivers up so well!

For example check the servlet JSON test between R13 and the R14 preview 3 (I can't give you direct link because the tests names anchors are missing from the HTML - https://www.techempower.com/benchmarks/previews/round14/r13-vs-r14p3.html#servlet.json). The PR is #2597 and there are several changes but if you try them one by one you will see that the Jackson library update from 2.3.0 to 2.7.8 is the major factor for improvement. The results go from 375,473 rps in R13 to 645,680 rps (or 586,897) in the R14 previews.

Indeed. Also a very impressive bump from what appears a single library update.

Incidentally, I fixed the linked diff file to include the named anchors. We've also fixed the diffs for Preview 3 to correctly show the "20-iteration" versions of the multi-query and updates tests.

Another observation - several rounds ago MySQL was beating PostgreSQL, but then PostgreSQL was 9.1 or 9.2. Currently TFB are using 9.3. I haven't checked the current MySQL version. But PostgreSQL "minor" releases are not minor at all - the current version is 9.6 and they decided the next version to be 10 to mitigate the perception of "minor". I'm thinking of preparing a PR for this. It seems to be easy because PostgreSQL is installed from their own PPA, and there is already 9.6.

This would be terrific. Certainly the momentum in Postgres has shown no sign of slowing; each new build brings a lot of exciting improvements.

And another thing (I promise to be the last): I have to make PR to revert the batch processing for the servlet-postgres - it seems to suffer from transaction deadlocks. The html report with the diff currently doesn't capture this problem - the update test is passing the lowest concurrency but gives 0 at higher concurrences. Which leads me to the question: How the current toolset implementation deals with test that created transaction deadlocks in the database? I think only PostgreSQL will be affected because transactions there can't be turned off.

This is a good question. We restart the database services between each test implementation so that Framework A will not affect Framework B, but we do not restart database services between each concurrency level. So if a database deadlocks at a low concurrency level, 0-RPS measurements would be expected at the subsequent higher concurrency levels. For the time being, we don't have any countermeasures in place for that, and restarting after each concurrency level would likely add a lot more wall-time to the full run.

Cheers and thank you for this great project

And thank you for your massive contributions and insights! Your participation is greatly appreciated.

@zloster
Copy link
Contributor Author

zloster commented Apr 13, 2017

So if a database deadlocks at a low concurrency level, 0-RPS measurements would be expected at the subsequent higher concurrency levels. For the time being, we don't have any countermeasures in place for that, and restarting after each concurrency level would likely add a lot more wall-time to the full run.

That's perfectly fine. The change in the diff HTML report makes the problem visible if it happens. The previous version was giving impression that everything is OK.

@vlsi
Copy link

vlsi commented Apr 16, 2017

I haven't added it because for the 42.0.0 release they've changed the way they do logging and depending on what logging framework is on the classpath there is a performance hit of about 10-15%. See the very recent messages on the mailing list

@zloster , is 10-15% based on your own measurements?
I've did a couple of benchmarks, and it seems there is no visible overhead in default configuration: https://www.postgresql.org/message-id/CAB%3DJe-GubVV%3DRyUJgdwGfoc4jsNkbnJk-385Vbyo4eabPZQqfw%40mail.gmail.com

What I mean is if current pgjdbc logging approach is bad, we need to figure out the conditions that trigger high overhead.

@zloster
Copy link
Contributor Author

zloster commented Apr 21, 2017

@vlsi No, it is not my measurement.
In this project (TFB) recently there were 2 or 3 PostgreSQL JDBC driver updates (I've managed to find 2 commits) :

  1. https://github.com/TechEmpower/FrameworkBenchmarks/pull/2682/files#diff-7c266220cad48c4c83471652054f6f8fR30
  2. 9850dfe#diff-8cfbaa8475e7f00d1ec1d2e7be7fc276L21

The second commit is reverting from 42.0.0 to 9.4.1212. This implementation is relatively new in TBF and the maintainer is quite active. Note that the project implementation is having some weird results during the Round 14 previews (preview 4 is the current one). So I can't be sure what is the reason to revert the driver version.

I'll try to get some test results myself this weekend to check for difference between 42.0.0 and 9.4.1212 and report back in the mailing list.

@vlsi
Copy link

vlsi commented Aug 30, 2017

@zloster , did you had a chance to review the benchmark?

@zloster
Copy link
Contributor Author

zloster commented Oct 25, 2017

@vlsi I've replied on the pgsql-jdbc mailing list a long time ago: https://www.postgresql.org/message-id/d5d39fffac03727934f262c793899321%40edno.moe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants