New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect pool usage masks setup in RP/thread manager #2878
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test verifying the problem is fixed.
@@ -95,7 +95,7 @@ namespace hpx { namespace threads { namespace detail | |||
auto const& rp = resource::get_partitioner(); | |||
for (std::size_t i = 0; i != pool_threads; ++i) | |||
{ | |||
bit_or(used_processing_units_, rp.get_pu_mask(threads_offset + i)); | |||
used_processing_units_ |= rp.get_pu_mask(threads_offset + i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not work for all configurations. I think the correct fix would be:
used_processing_units_ =
bit_or(used_processing_units_, rp.get_pu_mask(threads_offset + i));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
bit_or(a,b)
returns a boolean true or false depending on the two masks and so it only sets the lowest bit. We actually want the true bitwise or operation and the submitted patch is correct. if CPU's <= 64
then a uint64_t
is bitwise or'ed (ok), if CPUs>64
, then a bitset<blah>
is or'ed.
I believe my patch is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for pointing this out. Still. there is a third option, namely if we use boost::dynamic_bitset
. I'm not sure if operator|=()
is provided in that case. IIRC, it provides only operator|()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was mistaken about the small CPU count version.
boost::dynamic_bitset
has the !=
operator and it has the signature
dynamic_bitset& operator|=(const dynamic_bitset& b);
which is exactly what we want in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we can merge then yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, could we still add that test, please?
#2891 Does not specifically test for the used processing unit, however it uses the mask to test against the currently used processing units in the RP. Should we mark this as duplicate and consider the added test case as good enough to check for correctness of the mask? |
How much effort would it be to add an explicit check for this to this test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test has been added. This looks good to me.
int main(int argc, char* argv[]) | ||
{ | ||
std::vector<std::string> cfg = { | ||
"hpx.os_threads=4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this is a good idea? The pu mask will be 1 for each pu used. If you run this test on a single core machine, then asking for 4 threads will set the pu flag, but only once and not 4 times. So bits in pu mask will not match num threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the only thing that will fail in this scenario will the test on line 25 then... I guess having at least 4 PUs available is a prerequisite of this test (this is also true for the other RP test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's sensible. If the machine being tested on has < N cores, - just skip the test and return pass. There is no point testing unless N is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Just a minor tweak to get the pool_usage_mask set correctly