Skip to content

<thread>: (lack of) Support for >64 cores #2099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Chronial opened this issue Aug 6, 2021 · 13 comments
Closed

<thread>: (lack of) Support for >64 cores #2099

Chronial opened this issue Aug 6, 2021 · 13 comments
Labels
documentation Related to documentation or comments resolved Successfully resolved without a commit

Comments

@Chronial
Copy link
Contributor

Chronial commented Aug 6, 2021

The stl is currently limited to using 64 cores: Threads created with std::thread will only ever occupy 64 cores (see #1238 (comment)) and std::hardware_concurrency returns a maximum of 64 (see #594).

With the availability of the 128 (logical) core Threadripper this is becoming less of a philosophical issue and becoming more relevant for real-world applications.

My understanding is that given the current win32 API, manual thread affinity management is necessary to access more than 64 cores (see e.g. the way the TBB starts threads). So this might not actually be fixable within the STL. In that case the STL should probably at least clearly document this limitation somewhere, since it is a rather surprising arbitrary limit.

Some background information on the 64 core problem in win32: https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups

@StephanTLavavej StephanTLavavej added the documentation Related to documentation or comments label Aug 8, 2021
@AlexGuteniev
Copy link
Contributor

Threads created with std::thread will only ever occupy 64 cores (see #1238 (comment)) and std::hardware_concurrency returns a maximum of 64 (see #594).

Other two affected places are std::async and parallel algorithms.

@AlexGuteniev
Copy link
Contributor

I'm not really sure where would you want to see it documented.

In sources? You don't look up sources often, probably you won't be looking there when everything seems to work.

There: https://docs.microsoft.com/en-us/cpp/standard-library/thread-class ?
It is possbile to create pull request there. But I guess you won't be using this documentation often, since thread is standard, and there are not so many differences from standard in later Visual C++, you probably be looking C++ documentation, like cppreference.com .

cppreference.com ? I don't think they will accept platform-specific details.

@AlexGuteniev
Copy link
Contributor

It is also not simple to come up with short and precise messages. My take is as follows (not too short, and possibly imprecise):

hardware_concurrency returns number of logical processors, which corresponds to number of hardware threads that can be executed simultaneously. It takes into account number of physical processors, number of cores in each physical processor, and simultaneous multithreading on each single core.

However, on systems with more than 64 processors this number is capped by number of logical processor in a single group. See Processor Groups


Parallel algorithms execute in unspecified number of threads and by dividing the work in unspecified number of chunks.

Number of threads is managed by Windows thread pool

For number of chunks, the implementation tries to make use of number of logical processors, which corresponds to number of hardware threads that can be executed simultaneously.

However, on systems with more than 64 processors the number of chunks is capped by number of logical processor in a single group. See Processor Groups

@Chronial
Copy link
Contributor Author

Chronial commented Aug 9, 2021

I'm not really sure where would you want to see it documented.

That is indeed a difficult question. But I think just having this issue here is a good start :).

@AlexGuteniev
Copy link
Contributor

Maybe as the most inclusive version:

  • A separate page in documentation that covers explanation of logical processors, being capped by processor groups, 32-bit builds limitation, note that it all is Microsoft specific and subject to change
  • In documentation and sources of std::thread::hardware_concurrency just mention "implementation specific: currently defined to return the number of logical processors in the current processor group" and a link to the above mentioned page
  • The same for documentation and sources for std::thread::thread, std::async, std::execution::parallel_policy, and std::execution::parallel_unsequenced_policy

@CaseyCarter CaseyCarter changed the title <thread>: Support for >64 cores <thread>: (lack of) Support for >64 cores Aug 9, 2021
@StephanTLavavej
Copy link
Member

FYI @TylerMSFT, this is an STL docs issue.

@mnatsuhara
Copy link
Contributor

It looks like the MicrosoftDocs PR has just been merged (thanks @AlexGuteniev, @TylerMSFT) -- it looks like the docs PR adds this information to the thread page, execution header page, and the future functions page.

Is this documentation sufficient to close this issue?

@YexuanXiao
Copy link
Contributor

Four months after this issue was raised, the updated documentation indicates that Windows 11 and Windows Server 2022 allow programs to use all CPU cores with default affinity. It seems that STL will not need to address this issue, and the documentation will need to be updated again to reflect the actual situation of Windows 11 and Windows Server 2022. @StephanTLavavej

@CaseyCarter
Copy link
Contributor

@TylerMSFT Could you see about updating the pertinent docs to clarify this isn't an issue in Windows 11/Windows Server 2022 or later?

@TylerMSFT
Copy link
Member

Added to my backlog.

@TylerMSFT
Copy link
Member

I updated the documentation for <execution>, <future> functions, and thread class to call out in a Microsoft-specific section the differences in available threads before and after Windows 11 and Windows Server 2022.
That's in PR #5665 in the docs private C++ repo. It'll be live by Friday afternoon.

Please close this issue if those updates suffice.

@CaseyCarter
Copy link
Contributor

Thanks, TylerMSFT! Hopefully we can finally put this issue to bed.

@StephanTLavavej
Copy link
Member

We later merged #5459 to fix thread::hardware_concurrency() after the OS changed behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Related to documentation or comments resolved Successfully resolved without a commit
Projects
None yet
Development

No branches or pull requests

7 participants