New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce MAX_TERMINATED_THREADS default, improve memory use on manycore… #2656
Conversation
@biddisco What new default value for this configuration constant did you have in mind? |
@biddisco Sorry, I should have asked: how did you determine the new value you propose. reducing it down to |
On our 32 core nodes, the previous value of 1000 could leave 32,000 uncleaned up tasks (and their stacks). Moving to KNL, we will be using upwards of 300K tasks in the cleanup zone, and with large stack sizes, this is a big problem. All tasks must be cleaned up sooner or later and the previous limit of 1000 was pulled from a hat anyway. We tested with 100 and found memory issues went away, but a value of 25 seems more realistic. I would prefer lower than that since larger values provide no gain other than a small saving of detour into the cleanup code. Smaller values are more likely to allow cache coherent reuse of memory than larger values. |
Thanks for this explanation of your rationale. The main reason why this delayed cleanup was put in place to begin with was to avoid acquiring a kernel lock in the scheduler. In the end the selected default value should give a reasonable compromise between having to acquire a kernel lock in the scheduler and the memory overheads eventually imposed by not-cleaned-up threads. Are we sure that having to acquire a kernel lock every 25 executed tasks wouldn't cause too much overhead? May be we should change the cleanup logic in a way that it would be executed whenever we have to acquire that lock for other reasons anyways? |
Any plans on how to move forward with this? |
The first is easier, the second gives more flexibility. I will do the first unless several of you prefer the second |
Why not do the second with a cmake option to set the default? |
@biddisco Why did you close this? Doesn't this give you any performance benefit anymore? |
This is absolutely critical. Without this patch we cannot run our code. However, I made a reasonable request for a change and it was not accepted, and I must work on other deadlines, so when I get time, I will revisit this, but for now the PR is going nowhere and can be deleted. |
I'll reopen this and might be able to find some time to work on it. |
@biddisco: please review |
…ad value - flyby: fixed stupid macro expansion bug in runtime configuration initialization
ba990cf
to
cf1bff4
Compare
Nitpicking: I have added one commit because a boolean option like HPX_WITH_CUDA might not be available and so HPX_HAVE_CUDA will not be set. With options that are not optional, such as MAX__TERMINATED_THREADS - there is not need for WITH/HAVE - this breaks convention in naming.
is enough |
a5d6728
to
cd97e0f
Compare
There were some changes to the STRINGIZE function that caused a conflict when I merged master, so I have rebased onto master and cleaned the problem. Please check that the vars I've used are correct. I might have broken something |
cd97e0f
to
1024b04
Compare
I'm not sure if the final commit was lost yesterday, or maybe I pushed it to the wrong repo. Please check the use of BOOST_STRINGIZE that I have changed to HPX_PP_STRINGIZE which I believe is the correct new usage. |
@biddisco: |
435aa82
to
597bb70
Compare
Fixed. |
@@ -239,6 +239,8 @@ namespace hpx { namespace util | |||
"min_add_new_count = ${HPX_THREAD_QUEUE_MIN_ADD_NEW_COUNT:10}", | |||
"max_add_new_count = ${HPX_THREAD_QUEUE_MAX_ADD_NEW_COUNT:10}", | |||
"max_delete_count = ${HPX_THREAD_QUEUE_MAX_DELETE_COUNT:1000}", | |||
"max_terminated_threads = ${HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question - where does this var come from "HPX_THREAD_QUEUE_MAX_TERMINATED_THREADS"
Anyone want to review/merge this. the review button is greyed out for me since I started it. |
… cpus
The default value of 1000 is too high and causes problems on many core machines. there does not seem to be any great advantage to having a high number of terminated tasks waiting for cleanup. Stack usage explodes when this number is large.