Skip to content

<thread>: std::thread::hardware_concurrency limited to 64, even on Windows 11 #5453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JoostHouben opened this issue May 1, 2025 · 5 comments · May be fixed by #5459
Open

<thread>: std::thread::hardware_concurrency limited to 64, even on Windows 11 #5453

JoostHouben opened this issue May 1, 2025 · 5 comments · May be fixed by #5459
Labels
bug Something isn't working

Comments

@JoostHouben
Copy link

JoostHouben commented May 1, 2025

Describe the bug

On machines with more than 64 logical processors, std::thread::hardware_concurrency() is capped to 64.

Microsoft's documentation for this function states:

hardware_concurrency returns the number of logical processors, which corresponds to the number of hardware threads that can execute simultaneously. It takes into account the number of physical processors, the number of cores in each physical processor, and simultaneous multithreading on each single core.

Before Windows 11 and Windows Server 2022, applications were limited by default to a single processor group, having at most 64 logical processors. This limited the number of concurrently executing threads to 64. For more information, see Processor Groups.

Starting with Windows 11 and Windows Server 2022, processes and their threads have processor affinities that by default span all processors in the system and across multiple groups on machines with more than 64 processors. The limit on the number of concurrent threads is now the total number of logical processors in the system.

This suggests that on Windows 11 (and Windows Server 2022 and 2025), hardware_concurrency should return the total number of logical processors in the system. But it does not. On my Threadripper 7980X system, it returns 64 instead of 128.

Command-line test case

**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.13.3
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************

C:\Temp>type hardware_concurrency.cpp
#include <thread>
#include <iostream>

int main() {
    std::cout << std::thread::hardware_concurrency() << std::endl;
}
C:\Temp>cl /EHsc /W4 /WX /std:c++latest .\hardware_concurrency.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.

hardware_concurrency.cpp
Microsoft (R) Incremental Linker Version 14.43.34809.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:hardware_concurrency.exe
hardware_concurrency.obj

C:\Temp>hardware_concurrency.exe
64

C:\Temp>echo %NUMBER_OF_PROCESSORS%
128

C:\Temp>systeminfo

...
OS Name:                       Microsoft Windows 11 Pro
OS Version:                    10.0.26100 N/A Build 26100
...

Expected behavior

std::thread::hardware_concurrency() should return the number of logical CPUs available on the system, in this case 128.

STL version

Visual Studio version:

Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.13.3

Additional context

This issue was previously reported as #1230. It was closed with this comment (#1230 (comment)):

Like the above commentators have pointed out std::hardware_concurrency is only a hint.
The reason it makes sense to return the number of threads in the processor group is for the reason that @sylveon gives. Without using Windows API calls there isn't a way to use all the threads you expect to be able to use if we returned the true number of processors on the system.

The comment by sylveon that was referred to was (#1230 (comment)):

A problem I see with making it process group aware is that if you use thread::hardware_concurrency as a reference for any kind of concurrency, without using the Windows API to configure processor groups you'll be running your 128 threads on 64 logical processors only.

Sure, making the STL aware of processor groups might be doable, but consider the case where you use thread::hardware_concurrency without using std::thread (arguably, this is bad but unfortunately fairly common, just searching for hardware_concurrency on GitHub gives you plenty of examples where this is done)

I don't think it's a standards violation, as the standard says the value should only be considered as a hint. In this case, you're being hinted at how much threads are supported by default without additional configuration.

Note that this explanation no longer holds. As described in Microsoft's official STL documentation quoted above, as well as in the Windows API documentation, starting on Windows 11, all logical processors across all processor groups can be used without needing to make any special Windows API calls. Moreover, multithreaded STL code actually does make use of more than 64 logical processors. For example, if I use a parallel execution policy with an algorithm from <algorithm>, all 128 logical processors of my machine get fully utilized.

Another similar issue was #2099. That issue was closed last September by updating the documentation (including what I quoted above), see here. However, as explained in this issue, the behavior of hardware_concurrency is not consistent with the updated documentation, and so hardware_concurrency should be updated.

The STL implementation of hardware_concurrency currently delegates to _Thrd_hardware_concurrency, which is:

_CRTIMP2_PURE unsigned int __cdecl _Thrd_hardware_concurrency() noexcept { // return number of processors
    SYSTEM_INFO info;
    GetNativeSystemInfo(&info);
    return info.dwNumberOfProcessors;
}

On Windows 11 and above this could be updated to use GetLogicalProcessorInformationEx. Here is a very crude implementation that gets the logical processors in a single physical package. Though perhaps a different RelationshipType would be more appropriate.

unsigned my_Thrd_hardware_concurrency() noexcept {  // return number of processors in the physical package
    DWORD byte_length = 0;

    GetLogicalProcessorInformationEx(RelationProcessorPackage, nullptr, &byte_length);

    std::unique_ptr<char[]> buffer(new char[byte_length]);
    auto ptr = PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX(buffer.get());

    if (!GetLogicalProcessorInformationEx(RelationProcessorPackage, ptr, &byte_length)) {
        std::cerr << "GetLogicalProcessorInformationEx failed." << std::endl;
    }

    int num_groups     = ptr->Processor.GroupCount;
    int num_processors = 0;
    for (int group = 0; group < num_groups; ++group) {
        num_processors += std::popcount(ptr->Processor.GroupMask[group].Mask);
    }

    return num_processors;
}
@StephanTLavavej StephanTLavavej added the bug Something isn't working label May 1, 2025
@StephanTLavavej
Copy link
Member

This does seem like a bug now, given that the OS behavior has changed.

@YexuanXiao
Copy link
Contributor

YexuanXiao commented May 2, 2025

If I understand correctly, according to the documentation Processor Groups, starting from Windows 11, a process will have multiple groups by default if the number of processors exceeds 64. GetActiveProcessorCount(ALL_PROCESSOR_GROUPS); can retrieve the number of processors across all processor groups. GetActiveProcessorCount is provided by Kernel32, does not introduce additional DLL dependencies, and Raymond Chen claims it is fast and efficient. GetActiveProcessorCount does not appear to be affected by any affinity settings.

@JoostHouben
Copy link
Author

JoostHouben commented May 2, 2025

@YexuanXiao if you provide instructions, I'm happy to run a test.

If you're just looking to test the behavior of GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) I can do that later tonight.

@YexuanXiao
Copy link
Contributor

@YexuanXiao if you provide instructions, I'm happy to run a test.

If you're just looking to test the behavior of GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) I can do that later tonight.

Testing the behavior of GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) is sufficient.

@JoostHouben
Copy link
Author

@YexuanXiao

Testing the behavior of GetActiveProcessorCount(ALL_PROCESSOR_GROUPS) is sufficient.

Confirmed, that works as expected:

**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.13.3
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

C:\Temp>type GetActiveProcessorCount.cpp
#include <iostream>

#include <Windows.h>

int main() {
    const DWORD allProcessorCount = GetActiveProcessorCount(ALL_PROCESSOR_GROUPS);
    std::cout << "Active Processor Count in all groups: " << allProcessorCount << std::endl;

    const WORD activeGroups = GetActiveProcessorGroupCount();
    std::cout << "Active Processor Group Count: " << activeGroups << std::endl;

    for (WORD groupIdx = 0; groupIdx < activeGroups; ++groupIdx) {
        const int processorCount = GetActiveProcessorCount(groupIdx);
        std::cout << "Active Processor Count in Group " << groupIdx << ": " << processorCount << std::endl;
    }
}

C:\Temp>cl /EHsc /W4 /WX /std:c++latest .\GetActiveProcessorCount.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.43.34809 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

/std:c++latest is provided as a preview of language features from the latest C++
working draft, and we're eager to hear about bugs and suggestions for improvements.
However, note that these features are provided as-is without support, and subject
to changes or removal as the working draft evolves. See
https://go.microsoft.com/fwlink/?linkid=2045807 for details.

GetActiveProcessorCount.cpp
Microsoft (R) Incremental Linker Version 14.43.34809.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:GetActiveProcessorCount.exe
GetActiveProcessorCount.obj

C:\Temp>.\GetActiveProcessorCount.exe
Active Processor Count in all groups: 128
Active Processor Group Count: 2
Active Processor Count in Group 0: 64
Active Processor Count in Group 1: 64

That's only in x64 mode though. In x86 mode it reports 2 groups of 32 CPUs for a total of 64. I'm guessing 32-bit compatibility for systems with >64 CPUs is not a huge concern though.

Aside from that, I also checked that NUMA settings don't affect the result: if I configure the system to use 4 NUMA nodes the output of the test exe is unchanged from what's shown above.

I can't vouch for the behavior on multi-socket systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants