Skip to content

ERROR: modpost: "__mulodi4" [drivers/block/nbd.ko] undefined! #1438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nathanchance opened this issue Aug 18, 2021 · 41 comments
Closed

ERROR: modpost: "__mulodi4" [drivers/block/nbd.ko] undefined! #1438

nathanchance opened this issue Aug 18, 2021 · 41 comments
Assignees
Labels
[BUG] linux-next This is an issue only seen in linux-next [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LINUX] 5.15 This bug was fixed in Linux 5.15 [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list.

Comments

@nathanchance
Copy link
Member

After commit fad7cd3310db ("nbd: add the check to prevent overflow in __nbd_ioctl()") in the block tree, the following error is seen with several different 32-bit configurations:

$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- LLVM=1 LLVM_IAS=1 distclean allmodconfig all
...
ERROR: modpost: "__mulodi4" [drivers/block/nbd.ko] undefined!
...

check_mul_overflow() will ultimately call __builtin_mul_overflow(), where I assume that it is problematic that the operands are long long so LLVM calls to the compiler-rt functions that we do not link against.

cc @kees

@nathanchance nathanchance added [BUG] Untriaged Something isn't working [BUG] llvm A bug that should be fixed in upstream LLVM [BUG] linux-next This is an issue only seen in linux-next labels Aug 18, 2021
@nickdesaulniers
Copy link
Member

nickdesaulniers commented Aug 25, 2021

there's an upstream bug for this (@stephenhines and @kraj are on it): https://llvm.org/pr28629

@nickdesaulniers nickdesaulniers added the Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. label Aug 25, 2021
@stephenhines
Copy link

The bad news is that this is probably still very low priority to get fixed. For Android, we just started out by including sub-components of compiler-rt directly with libgcc, and later switched completely to Clang's own builtin library with no libgcc.

@rengolin
Copy link

My attempts to reproduce a long long multiplication come out lowered from clang on all Arm 32/64 soft/hard I could come up with, with the only exception being -march=armv6 -mthumb, which calls __aeabi_lmul, not __mulodi4. It's possible they're aliases and that's what the symbol points to? Is this it?

But when I try to link, it just works. clang is calling arm-linux-gnueabi-ld. I'm not trying freestanding, so that may be another reason I can't reproduce?

I think the "low priority" of it is because of the fuzzy nature of the original reports, which had __has_builtin, sanitizers and all sorts of other red herrings that it took too long to narrow down. I don't think this is a "low priority" bug per se, but it seems that it's such a corner case that was always worked around, so no one really bothered to go down the rabbit hole.

With a small enough reproducer, it should be much clearer to find where the problem is and perhaps it's even easy to implement. I'm guessing the pattern is not enabled in Thumb and ends up going to the runtime. It could be a simple case of adding a custom lowering for Thumb...

@nathanchance
Copy link
Member Author

nathanchance commented Aug 25, 2021

I'll see if I can get a smaller reproducer tomorrow then.

NOTE: This is not just a 32-bit ARM error, I do also see it on 32-bit MIPS.

@nickdesaulniers
Copy link
Member

We can probably creduce the kernel easily.

@rengolin
Copy link

Thanks! Please attach the reproducer to the original bug, so other people can see it, and we update this if/once we fix it.

@rengolin
Copy link

NOTE: This is not just a 32-bit ARM error, I do also see it on 32-bit MIPS.

This sounds like people just didn't get to it because no one cared enough about 32-bits by the time the change was introduced...

@nathanchance
Copy link
Member Author

nathanchance commented Aug 26, 2021

Reproducer posted: https://llvm.org/pr28629

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Aug 27, 2021

If we can fix clang for ARM and MIPS at least, then we might be able to ship in the kernel:

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5170a630778d..2271a50ba4c0 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1388,6 +1388,12 @@ static void nbd_set_cmd_timeout(struct nbd_device *nbd, u64 timeout)
                blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ);
 }
 
+// FIXME: https://llvm.org/pr28629
+// https://github.com/ClangBuiltLinux/linux/issues/1438
+#if defined(__clang__) && (defined(__arm__) || (defined(__mips__))
+#define MULODI4_BROKEN
+#endif
+
 /* Must be called with config_lock held */
 static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
                       unsigned int cmd, unsigned long arg)
@@ -1408,8 +1414,10 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
        case NBD_SET_SIZE:
                return nbd_set_size(nbd, arg, config->blksize);
        case NBD_SET_SIZE_BLOCKS:
+#ifndef MULODI4_BROKEN
                if (check_mul_overflow((loff_t)arg, config->blksize, &bytesize))
                        return -EINVAL;
+#endif
                return nbd_set_size(nbd, bytesize, config->blksize);
        case NBD_SET_TIMEOUT:
                nbd_set_cmd_timeout(nbd, arg);

but with an added version check. Err, that would use bytesize uninitialized. We'd maybe go back to using arg * config->blksize instead. Errr...the kernel patch that exposed this fixes a bug; we'd probably wan't to use a hand written check_mul_overflow that doesn't rely on a builtin.

@nathanchance
Copy link
Member Author

Shouldn't we be working around this issue generically, rather than in that specific call site? Initial idea would be breaking up the COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW define into add, sub, and mul then defining the mul one only if we know we are working with a good compiler and architecture combination. See include/linux/overflow.h.

@rengolin
Copy link

Quick update: This problem was mostly fixed earlier this month (by @topperc) for x86 and @nickdesaulniers is working to get that propagated to Arm and Mips. It will only fix from now on (clang 14+), leaving previous clangs with the same problem.

We may be able to get this in clang 13 if we can backport the current RC3. No promises, though, as it's very late in the release process. But if it works, we would get 13 with the fix in a a few weeks.

That still leaves all previous clangs (12 and older) unfixed (we don't backport beyond six months). How the kernel is going to fix for those older versions is up to you guys.

nickdesaulniers added a commit to llvm/llvm-project that referenced this issue Aug 27, 2021
__has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support allmodconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108842
nickdesaulniers added a commit to llvm/llvm-project that referenced this issue Aug 27, 2021
__has_builtin(__builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108844
@nickdesaulniers
Copy link
Member

Some toolchain patches have gone in:
32b ARM: https://reviews.llvm.org/rG5c91b98c5d45243352bf10262454bcac77cd3fed
32b MIPS 1: https://reviews.llvm.org/rGc8c176d999d2c2166dee8d67c32b4f8f42b4f1bd

MIPS is now hitting __multi3 being undefined. @rengolin mentioned that perhaps the above 2 could be cherry picked to clang-13, but that would depend on https://reviews.llvm.org/rGb2ca4dc935859b324fd7e9ca804160913a7468a5 and @tstellar + @topperc 's feedback). Since we need to work around this in the kernel anyways, I'm not interested in rushing this, unless the clang-13 release is perhaps still "far" away (like a month).

@nickdesaulniers nickdesaulniers self-assigned this Aug 27, 2021
@nickdesaulniers nickdesaulniers added [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x and removed [BUG] Untriaged Something isn't working labels Aug 27, 2021
@topperc
Copy link

topperc commented Aug 27, 2021

Is 32-bit X86 still broken for __mulodi4?

@rengolin
Copy link

It seems so. In your original patch you only add it for 128-bits, not 64-bits. However, further up the X86Lowering code, ISD::SMULO is legal (custom lowered) for MVT::i64 and it doesn't seem to be conditional on 32/64 bits.

But when I compile the original code with -target i386 I still get the __mulodi4 call. Perhaps the X86 back-end needs the same handling for RTLIB on 64bits?

@topperc
Copy link

topperc commented Aug 28, 2021

It seems so. In your original patch you only add it for 128-bits, not 64-bits. However, further up the X86Lowering code, ISD::SMULO is legal (custom lowered) for MVT::i64 and it doesn't seem to be conditional on 32/64 bits.

But when I compile the original code with -target i386 I still get the __mulodi4 call. Perhaps the X86 back-end needs the same handling for RTLIB on 64bits?

Isn't it in this for loop that has a continue for i64 on 32-bit targets?

  for (auto VT : { MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {                                                                                                                                                                                                
    if (VT == MVT::i64 && !Subtarget.is64Bit())                                                                                                                                                                                                              
      continue;                                                                                                                                                                                                                                              
    // Add/Sub/Mul with overflow operations are custom lowered.                                                                                                                                                                                              
    setOperationAction(ISD::SADDO, VT, Custom);                                                                                                                                                                                                              
    setOperationAction(ISD::UADDO, VT, Custom);                                                                                                                                                                                                              
    setOperationAction(ISD::SSUBO, VT, Custom);                                                                                                                                                                                                              
    setOperationAction(ISD::USUBO, VT, Custom);                                                                                                                                                                                                              
    setOperationAction(ISD::SMULO, VT, Custom);                                                                                                                                                                                                              
    setOperationAction(ISD::UMULO, VT, Custom);      

@rengolin
Copy link

Isn't it in this for loop that has a continue for i64 on 32-bit targets?

aha! Yes, I missed that. Been looking at this from different computers and mobiles, lacking context on all of them.

Not sure the kernel cares about i386 anymore but with your legalisation fix, it seems the same thing would work just like Arm and Mips.

@topperc
Copy link

topperc commented Aug 28, 2021

Isn't it in this for loop that has a continue for i64 on 32-bit targets?

aha! Yes, I missed that. Been looking at this from different computers and mobiles, lacking context on all of them.

Not sure the kernel cares about i386 anymore but with your legalisation fix, it seems the same thing would work just like Arm and Mips.

Are we heading in the direction of removing MULO lib calls completely from llvm? My initial patch was just to fix the cases that wouldn’t link with either libgcc or compiler-rt.

@rengolin
Copy link

Are we heading in the direction of removing MULO lib calls completely from llvm? My initial patch was just to fix the cases that wouldn’t link with either libgcc or compiler-rt.

Maybe? I'm not advocating for it, I'm just trying to fix an issue that exists for over 5 years and still doesn't have a nice solution.

Linking compiler-rt isn't mandatory and not used on many systems. Linking with compiler-rt and libgcc won't work, so if there is functionality that is only available in compiler-rt, we can't know if the linker will find it or not, so lowering calls to __mulodi4 is bound to fail on many systems.

The current legalisation is un-optimised, but Compiler-RT's implementation is plain-C for all platforms, so not the most optimised either. If we can lower the builtin with the compiler and avoid the linking errors on anything that doesn't use compiler-rt, then I think it's a good compromise.

Even if libgcc implements it, the kernel and embedded software won't have it, so are bound to have the same problem. That's why they use __has_builtin all over the place and if we say we have it, then we better implement it. But we don't, we belay that for compiler-rt, which they don't link against.

I don't think eliding the calls as a target option is the best option, but right now it's the only option. I don't really know how Clang can emit code that it knows compiler-rt implements when it doesn't know if it will be linked against it.

There was a discussion many years ago to split builtins from compiler-rt and make it so clang always linked against that. It would obviously clash with symbols from libgcc and other RT libs, but we could mark them weak or some other linkage magic that escapes me now.

Perhaps we need some more thorough discussion on the list...

@kraj
Copy link

kraj commented Aug 28, 2021

for compatibility reasons I think its best to remove it from clang.

@nickdesaulniers
Copy link
Member

But when I compile the original code with -target i386 I still get the __mulodi4 call. Perhaps the X86 back-end needs the same handling for RTLIB on 64bits?

Not sure the kernel cares about i386 anymore but with your legalisation fix, it seems the same thing would work just like Arm and Mips.

Yes, this is broken for 32b x86 kernels as well (we just don't have CI coverage for ARCH=i386 defconfig + CONFIG_BLK_DEV_NBD=y). If I enable CONFIG_BLK_DEV_NBD=y, then I observe the following linkage failure:

ld.lld: error: undefined symbol: __mulodi4

@nickdesaulniers
Copy link
Member

nickdesaulniers commented Aug 30, 2021

mips patch 2/2: https://reviews.llvm.org/D108926
i386 patch: https://reviews.llvm.org/D108928

For future travelers, llvm/include/llvm/IR/RuntimeLibcalls.def has the relevant mapping for these functions.

jungpark-mlir added a commit to jungpark-mlir/rocMLIR that referenced this issue Sep 21, 2021
* [X86] Remove isel predicates for xgetbv/xsetbv instructions so they can work on Windows.

https://reviews.llvm.org/D56686  was supposed to allow these to
work on Windows without needing to enable the xsave feature to
match MSVC. It seems this didn't work because the backend isel
patterns would still block it.

This patch removes the predicates from the isel patterns.

Fixes PR51706.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109097

* [libc++] Remove an unused internal concept.

Removed as suggested by @Quuxplusone during the review of D109075.

* [AIX][PowerPC] Define __powerpc and __PPC macros

%%%
This patch defines the macros __powerpc and __PPC on AIX to be consistent with XL for AIX. See: https://www.ibm.com/docs/en/xl-c-and-cpp-aix/13.1.0?topic=macros-related-platform

Note: GCC does not currently define __powerpc and __PPC so users should prefer the __powerpc__ and __PPC__ forms.
%%%

Reviewed By: cebowleratibm

Differential Revision: https://reviews.llvm.org/D108917

* [Bazel] Add explicit dependency on llvm:Support to reflect layering

Differential Revision: https://reviews.llvm.org/D109173

* [InlineCost] Introduce attributes to override InlineCost for inliner testing

This patch introduces four new string attributes: function-inline-cost,
function-inline-threshold, call-inline-cost and call-threshold-bonus.
These attributes allow you to selectively override some aspects of
InlineCost analysis. That would allow us to test inliner separately from
the InlineCost analysis.

That could be useful when you're trying to write tests for inliner and
you need to test some very specific situation, like "the inline cost has
to be this high", or "the threshold has to be this low". Right now every
time someone does that, they have get creative to come up with a way to
make the InlineCost give them the number they need (like adding ~30
load/add pairs for a trivial test). This process can be somewhat tedious
which can discourage some people from writing enough tests for their
changes. Also, that results in tests that are fragile and can be easily
broken without anyone noticing it because the test writer can't
explicitly control what input the inliner will get from the inline cost
analysis.

These new attributes will alleviate those problems to an extent.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D109033

* [MipsISelLowering] avoid emitting libcalls to __multi3

Similar to D108842 and D108844.

__has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks MIPS malta_defconfig builds of the Linux kernel that are
using __builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108926

* [WebAssembly] Add Wasm SjLj support

This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md

This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.

This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D108960

* [WebAssembly] Fix names of WebAssemblyWrapper SDNodes. NFC

Other platforms all use CamelCase as normal for these wrapper nodes.

Differential Revision: https://reviews.llvm.org/D109172

* [SCEVExpander] Simplify pointer overflow check

This is a followup to D104662 to generate slightly nicer code for
pointer overflow checks. Bypass expandAddToGEP and instead
explicitly generate i8 GEPs. This saves some bitcasts and negates
the value in a more obvious way. In particular, this prevents SCEV
from looking through the umul.with.overflow, same as in the integer
case.

The wrapping-pointer-ni.ll test deserves a comment: Previously,
this generated a typed GEP which used the umulo argument rather
than the multiplication result. This results in more compact IR in
that case, but effectively does the multiplication twice, the
second one is just hidden in the GEP. Reusing the umulo result
seems pretty reasonable to me.

Differential Revision: https://reviews.llvm.org/D109093

* [CSSPGO] Allow inlining recursive call for preinliner

When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.

However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.

The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.

Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.

This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.

With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)

Differential Revision: https://reviews.llvm.org/D109104

* [test][NewPM] Remove RUN lines using -analyze

Only tests in llvm/test/Analysis.

-analyze is legacy PM-specific.

This only touches files with `-passes`.

I looked through everything and made sure that everything had a new PM equivalent.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D109040

* [test] Remove missed RUN line after D109040

* Try to unbreak Win build differently after 973519826edb76

Looks like the MS STL wants StringMapKeyIterator::operator*() to be const.
Return the result by copy instead of reference to do that.
Assigning to a hash map key iterator doesn't make sense anyways.

Also reverts 123f811fe5b0b which is now hopefully no longer needed.

Differential Revision: https://reviews.llvm.org/D109167

* Revert "Try to unbreak Win build differently after 973519826edb76"

Breaks the build and failed pre-merge checks:
https://buildkite.com/llvm-project/premerge-checks/builds/54930#07373971-3d37-49cf-9def-22c0d724ee23

> llvm-project/lld/wasm/Writer.cpp:521:16: error: non-const lvalue reference to
>  type 'llvm::StringRef' cannot bind to a temporary of type 'llvm::StringRef'
>    for (auto &feature : used.keys()) {

This reverts commit 5881dcff7e76a68323edc8bb3c6e14420ad9cf7c.

* Fix lld build after 5881dcff7e76a68

* [WebAssemlby] Remove redundant SDTypeProfile. NFC

I added this back in https://reviews.llvm.org/D54647 but it wasn't
actually needed.

Differential Revision: https://reviews.llvm.org/D109176

* [test] Remove legacy PM tests in llvm/test/Other

Differential Revision: https://reviews.llvm.org/D109180

* [llvm-profgen] Turn off cold context trimming by default

We merge cold context by default to save profile size. However trimming cold context after merging doesn't save size much, so default to off to reflect how it's commonly used.

Differential Revision: https://reviews.llvm.org/D109166

* [NFC] Remove some unclear attribute methods

To any downstream users broken by this change, please examine your uses
of these methods and see if you can use a better method. For example,
getAttribute(AttributeList::FunctionIndex) => getFnAttr(), or
addAttribute(AttributeList::FirstArgIndex + ArgNo) =>
addParamAttribute(ArgNo). 0 corresponds to ReturnIndex, ~0 corresponds
to FunctionIndex. This may make future cleanups less painful.

I've made the mistake of assuming that these indexes are for parameters
multiple times, but actually they're based off of a weird indexing
scheme AttributeList::AttrIndex where 0 is the return value and ~0 is
the function. Hopefully renaming these methods will make this clearer.
Ideally users should use more specific methods like
AttributeList::getFnAttr().

This touches all relevant methods in AttributeList, CallBase, and Function.

This hopefully will make easier a future change to cleanup AttrIndex. A
previous worry about cleaning up AttrIndex was that too many downstream
users would have to look through all uses of AttrIndex and relevant
attribute method calls to see if anything was unintentionally hardcoded
(e.g. using 0 instead of ReturnIndex). With this change hopefully
downstream users will look at existing usages of these methods and clean
them up.

Reviewed By: rnk, MaskRay

Differential Revision: https://reviews.llvm.org/D108614

* [Verifier] Only allow invariant.group metadata on stores and loads

As specified by https://llvm.org/docs/LangRef.html#invariant-group-metadata.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D109182

* [MemorySSA] Properly handle liveOnEntry in the walker printer

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D109177

* Fix lldb after D108614

* [libc++] Define insert_iterator::iter with ranges::iterator_t.

The `insert_iterator::iter` member is defined as `Container::iterator` but
the standard requires `iter` to be defined in terms of `ranges::iterator_t` as
of C++20. So, if in C++20 or later, define the `iter` member as
`ranges::iterator_t`.

Original patch by Joe Loser!

Differential Revision: https://reviews.llvm.org/D108575

* [NFC] Added testcase for PR40750

* [mlir] speed up construction of LLVM IR constants when possible

The translation to LLVM IR used to construct sequential constants by recurring
down to individual elements, creating constant values for them, and wrapping
them into aggregate constants in post-order. This is highly inefficient for
large constants with known data such as DenseElementsAttr. Use LLVM's
ConstantData for the innermost dimension instead. LLVM does seem to support
data constants for nested sequential constants so the outer dimensions are
still handled recursively. Nevertheless, this speeds up the translation of
large constants with equal dimensions by up to 30x.

Users are advised to rewrite large constants to use flat types before
translating to LLVM IR if more efficiency in translation is necessary. This is
not done automatically as the translation is not aware of the expectations of
the overall compilation flow about type changes and indexing, in particular for
global constants with external linkage.

Reviewed By: silvas

Differential Revision: https://reviews.llvm.org/D109152

* [OpenCL] Remove decls for scalar vloada_half and vstorea_half* fns

These functions are not part of the OpenCL C specification.

See https://github.com/KhronosGroup/OpenCL-Docs/issues/648 for a
clarification regarding the vloada_half declarations.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D108761

* [flang] NFC: change non-nullable pointer arguments to references

Ticking off a Parser TODO: Preprocessor::Directive()'s Prescanner
argument should be a reference, not a pointer.

Differential Revision: https://reviews.llvm.org/D109094

* [flang] Fix scope in which undeclared symbols are created

Don't create new symbols in FORALL, implied DO, or other
construct scopes when an undeclared name appears; use the
innermost enclosing program unit's scope.  This clears up
a pending TODO in name resolution, and also exposes (& fixes)
an unnoticed name resolution problem in a module file test.

Differential Revision: https://reviews.llvm.org/D109095

* [NFC] Regenerate SVE ACLE intrinsics tests

Change-Id: Ic4ec50f9a53fcf58e86104bf19ba229c1dd132d0

* [Sanitizers] intercept clock_getcpuclockid on FreeBSD, and pthread_getcpuclockid.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108884

* Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing"

This reverts commit a2768b4732a0216dfd346d34e428685f03f10549.

Breaks sanitizer-x86_64-linux-fast buildbot:
https://lab.llvm.org/buildbot/#/builders/5/builds/11334

Log snippet:
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80
FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729)
******************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ********************
Script:
--
: 'RUN: at line 4';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 5';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 8';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 11';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
: 'RUN: at line 14';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
Exit Code: 2
Command Output (stderr):
--
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples'
    #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53
    #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase*, 8u>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21
    #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13
    #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16
    #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12
    #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>*, llvm::ProfileSummaryInfo*, llvm::CallGraph*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15
    #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21
    #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21
    #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7
    #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12
    #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
    #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll
--
********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  LLVM :: Transforms/SampleProfile/early-inline.ll
  LLVM :: Transforms/SampleProfile/inline-cold.ll

* [asan] Fixed link error by setting jump symbol to R_X86_64_PLT32.

Fixing this link error:
ld: error: relocation R_X86_64_PC32 cannot be used against symbol __asan_report_load...; recompile with -fPIC

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D109183

* Fully qualify template template parameters when printing

I discovered this quirk when working on some DWARF - AST printing prints
type template parameters fully qualified, but printed template template
parameters the way they were written syntactically, or wholely
unqualified - instead, we should print them consistently with the way we
print type template parameters: fully qualified.

The one place this got weird was for partial specializations like in
ast-print-temp-class.cpp - hence the need for checking for
TemplateNameDependenceScope::DependentInstantiation template template
parameters. (not 100% sure that's the right solution to that, though -
open to ideas)

Differential Revision: https://reviews.llvm.org/D108794

* [GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1

This adds the following combines:

```
x = ... 0 or 1
c = icmp eq x, 1

->

c = x
```

and

```
x = ... 0 or 1
c = icmp ne x, 0

->

c = x
```

When the target's true value for the relevant types is 1.

This showed up in the following situation:

https://godbolt.org/z/M5jKexWTW

SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.

This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)

Differential Revision: https://reviews.llvm.org/D109130

* [ORC] Move callWrapper and callSPSWrapper functions to ExecutorProcessControl.

The ExecutionSession versions now just forward to the implementations in
ExecutorProcessControl.

This allows callWrapper / callSPSWrapper to be used while bootstrapping an
ExecutorProcessControl instance.

* [ORC] Add specialized SPSSerializationTraits for ArrayRef<char>.

Deserializing from an SPSSequence<char> to an an ArrayRef<char> will point the
ArrayRef<char> at the input buffer.

* [ORC] Add EPCGenericJITLinkMemoryManager: memory management via EPC calls.

All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.

* [gn build] Port dad60f8071d5

* [ORC] Range check and narrow size value.

This should fix the build issues in
https://lab.llvm.org/buildbot#builders/171/builds/3149.

* [Sanitizers] remove empty test case.

* Reland "Try to unbreak Win build differently after 973519826edb76""

Build should be fixed by
https://github.com/llvm/llvm-project/commit/9d22754389

This reverts commit df052e1732ab57f5d9c684ceeaed3ab39073cd9f.

Differential Revision: https://reviews.llvm.org/D109181

* [openmp] NFC add bitcode comment

* [runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info

Requested in review comment on D108476

* [runtimeunroll] Support epilogue unrolling with a parent loop

This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop.  When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop.

Differential Revision: https://reviews.llvm.org/D108476

* [WebAssembly] Rename WrapperPIC -> WrapperREL. NFC

This ISD node/wrapper represents am address which is relative to a base
address and therefore lowers to `i32.const` rather than `global.get`.

Use this wrapper type for TLS-relative addresses, paving the way for the
non-REL wrapper to be used to external TLS address once those are
supported.

Differential Revision: https://reviews.llvm.org/D109179

* [AMDGPU] Fold immediates in the optimizeCompareInstr

Peephole works before the first SIFoldOperands so most of
the immediates are in registers.

Differential Revision: https://reviews.llvm.org/D109186

* [CSSPGO] Honor preinliner decision for ThinLTO importing

When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context.

Differential Revision: https://reviews.llvm.org/D109088

* [Coroutines] Only run verifyFunction in debug mode

verifyFunction can be really slow on large functions. This can significantly slow down compilation in production.
Given that coroutine passes are fairly stable now, we should only run it in debug mode.

Differential Revision: https://reviews.llvm.org/D109198

* [AMDGPU] Process any power of 2 in optimizeCompareInstr

Differential Revision: https://reviews.llvm.org/D109201

* [mlir][python] Simplify python extension loading.

* Now that packaging has stabilized, removes old mechanisms for loading extensions, preferring direct importing.
* Removes _cext_loader.py, _dlloader.py as unnecessary.
* Fixes the path where the CAPI dll is written on Windows. This enables that path of least resistance loading behavior to work with no further drama (see: https://bugs.python.org/issue36085).
* With this patch, `ninja check-mlir` on Windows with Python bindings works for me, modulo some failures that are actually due to a couple of pre-existing Windows bugs. I think this is the first time the Windows Python bindings have worked upstream.
* Downstream changes needed:
  * If downstreams are using the now removed `load_extension`, `reexport_cext`, etc, then those should be replaced with normal import statements as done in this patch.

Reviewed By: jdd, aartbik

Differential Revision: https://reviews.llvm.org/D108489

* [mlir][scf] Allow runtime type of iter_args to change

The limitation on iter_args introduced with D108806 is too restricting. Changes of the runtime type should be allowed.

Extends the dim op canonicalization with a simple analysis to determine when it is safe to canonicalize.

Differential Revision: https://reviews.llvm.org/D109125

* Fix typo in RISCVMatInt.cpp comments

* [LoopPredication] Fix MemorySSA crash in predicateLoopExits

The attached testcase crashes without the patch (Not the same accesses
in the same order).

When we move instructions before another instruction, we also need to
update the memory accesses corresponding to it.

Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D109197

* Revert "[NFC] Regenerate SVE ACLE intrinsics tests"

This reverts commit 8749a556da96fb17df1a2e36b860527e557c8c7b.

* [NFC] Recommit "Regenerate SVE ACLE intrinsics tests"

Change-Id: Ida45fc41231cd71709048f2d37f228f14053514e

* [OMPIRBuilder] Add ordered directive to OMPBuilder

Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430

* [RISCV] Add SiFive core S51

Add SiFive core s51 as rv64imac RocketModel

Reviewed-By: MaskRay, evandro
Differential Revision: https://reviews.llvm.org/D108886

* [Coroutines] [Clang] Look up coroutine component in std namespace first

Summary: Now in libcxx and clang, all the coroutine components are
defined in std::experimental namespace.
And now the coroutine TS is merged into C++20. So in the working draft
like N4892, we could find the coroutine components is defined in std
namespace instead of std::experimental namespace.
And the coroutine support in clang seems to be relatively stable. So I
think it may be suitable to move the coroutine component into the
experiment namespace now.

But move the coroutine component into the std namespace may be an break
change. So I planned to split this change into two patch. One in clang
and other in libcxx.

This patch would make clang lookup coroutine_traits in std namespace
first. For the compatibility consideration, clang would lookup in
std::experimental namespace if it can't find definitions in std
namespace and emit a warning in this case. So the existing codes
wouldn't be break after update compiler.

Test Plan: check-clang, check-libcxx

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D108696

* AMDGPU: Remove FeatureLocalMemorySize0

There's no reason to make this an explicit feature, since it's implied
by the lack of a feature with a size.

* Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount."

This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and
is not a right patch according to comments in D91724

This reverts commit 42eaf4fe0adef3344adfd9fbccd49f325cb549ef.

* [PowerPC] Enable fast-isel on AIX 64 subtarget

This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D98844

* [AArch64][GlobalISel] Support for folding G_ROTR as shifted operands.

This allows selection like: eor w0, w1, w2, ror #8

Saves 500 bytes on ClamAV -Os, which is 0.1%.

Differential Revision: https://reviews.llvm.org/D109206

* Reformulate OrcJIT tutorial doc to make it more clear.

Fixed a minor writing error. The text was hard to understand.

Reviewed By: lhames, mehdi_amini

Differential Revision: https://reviews.llvm.org/D106235

* [Test] Missed opt test for D108910

We can fold loop phis after we've proved that some exit has EC=0
in IndVars.

Patch by Dmitry Makogon!

* [flang] Extend common block size to cover equivalence storage

The size of common block should be extended to cover any storage
sequence that are storage associated with the common block via
equivalences (8.10.2.2 point 1 (2)).

In symbol size and offset computation, the size of the common block
was not always extended to cover storage association. It was only done
if the "base symbol of an equivalence group"(*) appeared in a common block
statement. Correct this to cover all cases where a symbol appearing in a
common block statement is storage associated.

(*) the base symbol of an equivalence group is the symbol whose storage
starts first in a storage association (if several symbols starts first,
the base symbol is the last one visited by the algorithm going through
the equivalence sets).

Differential Revision: https://reviews.llvm.org/D109156

* [mlir][flang] Do not prevent integer types from being parsed as MLIR keywords

DialectAsmParser::parseKeyword is rejecting `'i' digit+` while it is
a valid identifier according to mlir/docs/LangRef.md.

Integer types actually used to be TOK_KEYWORD a while back before the
change: https://github.com/llvm/llvm-project/commit/6af866c58d21813fb243906611d02bb2a8ffa43a.

This patch Modifies `isCurrentTokenAKeyword` to return true for tokens that
match integer types too.

The motivation for this change is the parsing of `!fir.type<{` `component-name: component-type,`+ `}>`
type in FIR that represent Fortran derived types. The component-names are
parsed as keywords, and can very well be i32 or any ixxx (which are
valid Fortran derived type component names).

The Quant dialect type parser had to be modified since it relied on `iw` not
being parsed as keywords.

Differential Revision: https://reviews.llvm.org/D108913

* [lldb] [test] Mark *fork-follow-child* tests non-Darwin

* [flang] Remove *- C++ -* incantation from runtime .cpp files. NFC

We should only need to spell the language out in .h files.

Differential Revision: https://reviews.llvm.org/D109138

* [lldb/lua] Force Lua version to be 5.3

Due to CMake cache, find_package in FindLuaAndSwig.cmake
will be ignored. This commit adds EXACT and REQUIRED flags
to it and removes find_package in Lua ScriptInterpreter.

Signed-off-by: Siger Yang <[email protected]>

Reviewed By: tammela, JDevlieghere

Differential Revision: https://reviews.llvm.org/D108515

* [flang] COMMAND_ARGUMENT_COUNT runtime implementation

Grab whatever ProgramStart has stored in executionEnvironment.argc and
subtract 1 (based on the assumption that ProgramStart is called with
a C-style argc that counts the command name as an argument).

Spoiler alert: The tests will evolve into fixtures when we implement
GET_COMMAND_ARGUMENT etc.

Differential Revision: https://reviews.llvm.org/D109048

* [AArch64][ISel] NFC: DAG.getMachineFunction() -> MF

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109135

* [AArch64][SME] Support NEON vector to GPR integer moves in streaming mode

A small subset of the NEON instruction set is legal in streaming mode.
This patch adds support for the following vector to integer move
instructions:

  0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.B[0]
  0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W|Xd,Vn.H[0]
  0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0]
  0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0]
  0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0]
  0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0]
  0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0]

Only the zero index variants are legal, all others indexes are illegal.
To support this, new instructions are defined specifically for zero
index which is hardcoded, along an implicit 'VectorIndex0' operand.
Since the index operand is implicit and takes no bits in the encoding,
custom decoding is required to add the operand.

I'm not sure if this is the best approach but the predicate constraint
on a subset of an operand is unusual. Would be interested to hear some
alternatives.

The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're
enabled by either +neon or +streaming-sve. This follows on from the work
in D106272 to support the subset of SVE(2) instructions that are legal
in streaming mode.

Depends on D107902.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D107903

* [sanitizer_common] Define wordexp_wrde_dooffs for Solaris

The Solaris buildbots have been broken for some time:

  In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/lib/asan/asan_interceptors.cpp:174:
  /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4000:19: error: use of undeclared identifier 'wordexp_wrde_dooffs'
          ((flags & wordexp_wrde_dooffs) ? p->we_offs : 0) + p->we_wordc;
                    ^

This was caused by D108646 <https://reviews.llvm.org/D108646>; the fix is
equivalent to D108838 <https://reviews.llvm.org/D108838>.

Tested on `amd64-pc-solaris2.11` and `sparcv9-sun-solaris2.11`.

Differential Revision: https://reviews.llvm.org/D109193

* [LoopBoundSplit] Update phi node in exit block

It fixes https://bugs.llvm.org/show_bug.cgi?id=51700

Differential Revision:

* [JITLink] Add initial Aarch64 support

Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment.

This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because:
* it might be easier to share code with the MachO "arm64" JITLink backend
* LLVM has individual targets for ARM and Aaarch64 as well

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108986

* [gn build] Port 2ed91da0f1f3

* [hwasan] Support more complicated lifetimes.

This is important as with exceptions enabled, non-POD allocas often have
two lifetime ends: the exception handler, and the normal one.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108365

* Revert "[lldb/lua] Force Lua version to be 5.3"

This commit causes buildbot failures if SWIG is available but Lua is
not present.

This reverts commit 7bb42dc6b114f57200abfebaaa01160914be6bba.

* [OpenCL] Supports optional 64-bit floating point types in C++ for OpenCL 2021

Adds support for a feature macro `__opencl_c_fp64` in C++ for OpenCL
2021 enabling a respective optional core feature from OpenCL 3.0.

This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.

Differential Revision: https://reviews.llvm.org/D108989

* [AMDGPU][MC][NFC][DOC] Updated description of registers

Corrected list of available register tuples to reflect changes introduced by
commits https://reviews.llvm.org/D103672 and https://reviews.llvm.org/D103800

See bug https://bugs.llvm.org/show_bug.cgi?id=51388

* [OptTable] Reapply Improve error message output for grouped short options

This reapplies 71d7fed3bc2ad6c22729d446526a59fcfd99bd03 which was
reverted by 3e2bd82f02c6cbbfb0544897c7645867f04b3a7e. This change
includes the fix for breaking the sanitizer bots.

As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770

* [X86][SLM] Fix PBLENDVB uops and throughput

SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64.

Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).

* [GlobalISel] Add convenience constructors to MemDesc

This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D109161

* [LoopDeletion] Move ICmpInst handling to getValueOnFirstIteration()

As noticed in https://reviews.llvm.org/D105688, it would be great to move
handling of ICmpInst which was in canProveExitOnFirstIteration() to
getValueOnFirstIteration().

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D108978
Reviewed By: reames

* [analyzer][NFCI] Allow clients of NoStateChangeFuncVisitor to check entire function calls, rather than each ExplodedNode in it

D105553 added NoStateChangeFuncVisitor, an abstract class to aid in creating
notes such as "Returning without writing to 'x'", or "Returning without changing
the ownership status of allocated memory". Its clients need to define, among
other things, what a change of state is.

For code like this:

f() {
  g();
}

foo() {
  f();
  h();
}

We'd have a path in the ExplodedGraph that looks like this:

             -- <g> -->
            /          \
         ---     <f>    -------->        --- <h> --->
        /                        \      /            \
--------        <foo>             ------    <foo>     -->

When we're interested in whether f neglected to change some property,
NoStateChangeFuncVisitor asks these questions:

                       ÷×~
                -- <g> -->
           ß   /          \$    @&#*
            ---     <f>    -------->        --- <h> --->
           /                        \      /            \
   --------        <foo>             ------    <foo>     -->

Has anything changed in between # and *?
Has anything changed in between & and *?
Has anything changed in between @ and *?
...
Has anything changed in between $ and *?
Has anything changed in between × and ~?
Has anything changed in between ÷ and ~?
...
Has anything changed in between ß and *?
...
This is a rather thorough line of questioning, which is why in D105819, I was
only interested in whether state *right before* and *right after* a function
call changed, and early returned to the CallEnter location:

if (!CurrN->getLocationAs<CallEnter>())
  return;
Except that I made a typo, and forgot to negate the condition. So, in this
patch, I'm fixing that, and under the same hood allow all clients to decide to
do this whole-function check instead of the thorough one.

Differential Revision: https://reviews.llvm.org/D108695

* [gn build] Port a375bfb5b729

* Reland "[clang-repl] Re-implement clang-interpreter as a test case."
Original commit message: "
    Original commit message:"
      The current infrastructure in lib/Interpreter has a tool, clang-repl, very
      similar to clang-interpreter which also allows incremental compilation.

      This patch moves clang-interpreter as a test case and drops it as conditionally
      built example as we already have clang-repl in place.

      Differential revision: https://reviews.llvm.org/D107049
    "

    This patch also ignores ppc due to missing weak symbol for __gxx_personality_v0
    which may be a feature request for the jit infrastructure. Also, adds a missing
    build system dependency to the orc jit.
"

Additionally, this patch defines a custom exception type and thus avoids the
requirement to include header <exception>, making it easier to deploy across
systems without standard location of the c++ headers.

Differential revision: https://reviews.llvm.org/D107049

* [ORC] Static cast more uint64_t to size_t

These instances don't have an obvious way to fail
nicely so I've just asserted they are within range.

Fixes the Arm 32 bit builds.

* [compiler-rt][Profile] Disable test on Arm/AArch64 Linux

While a fix for flaky results is being reviewed.

* [gn build] (manually) port 6fe2beba7d2a (ExceptionTests)

* Revert "Reland "[clang-repl] Re-implement clang-interpreter as a test case.""

This reverts commit 6fe2beba7d2a41964af658c8c59dd172683ef739 which fails on
clang-hexagon-elf

* Revert "[gn build] (manually) port 6fe2beba7d2a (ExceptionTests)"

This reverts commit da47c2719b1094a29427917ddb157c9c716e876d.
6fe2beba7d2a was reverted in 885964046114.

* [lldb] Support .debug_rnglists.dwo sections in dwp file

This patch considers the CU index entry
when reading the .debug_rnglists.dwo section.

Reviewed By: jankratochvil

Differential Revision: https://reviews.llvm.org/D107456

* Revert "[NFC] Recommit "Regenerate SVE ACLE intrinsics tests""

This reverts commit 91eda9c30f33da6ec6da70b59a5f5da6c6397039.
Breaks tests on macOS, both intel and arm. See e.g.
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8837137028177680097/+/u/package_clang/stdout?format=raw
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket/8837137028177680081/+/u/package_clang/stdout?format=raw
http://45.33.8.238/macm1/17258/step_7.txt
http://45.33.8.238/mac/35004/step_7.txt

* [lldb] [test] Mark vfork-follow-child-* tests unsupported (flaky) on aarch64

* [lldb] [test] Mark the remaining vfork-follow-child test unsupported (flaky) on aarch64

* [CUDA][NFC] Fix wrong assert information

Reviewed By: fodinabor

Differential Revision: https://reviews.llvm.org/D109232

* Remove blank from NaN string representation

Flang front end function DumpHexadecimal generates a string
representation of a REAL value.  When the value is a NaN, the string
contains a blank, as in "NaN 0x7fc00000".  This function is used by
lowering to generate a string that is then passed to llvm Support
function convertFromStringSpecials, which does not expect a blank
in the string.  Remove the blank to allow correct recognition of a
NaN by this llvm function.

Note that function DumpHexadecimal is not exercised by the front end
itself.  This functionality is only exercised by code that is not yet
present in llvm.

* [mlir] Update EmitC documentation

* [mlir][sparse] refine heuristic for iteration graph topsort

The sparse index order must always be satisfied, but this
may give a choice in topsorts for several cases. We broke
ties in favor of any dense index order, since this gives
good locality. However, breaking ties in favor of pushing
unrelated indices into sparse iteration spaces gives better
asymptotic complexity. This revision improves the heuristic.

Note that in the long run, we are really interested in using
ML for ML to find the best loop ordering as a replacement for
such heuristics.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D109100

* [clangd] Use the active file's language for hover code blocks

This helps improve the syntax highlighting for Objective-C code,
although it currently doesn't work well in VS Code with
methods/properties/ivars since we don't currently include the proper
decl context (e.g. class).

Differential Revision: https://reviews.llvm.org/D108584

* [CMake] Add targets for generating coverage reports

This is a pretty small bit of CMake goop to generate code coverage
reports. I always forget the right script invocation and end up
fumbling around too much.

Wouldn't it be great to have targets that "Just Work"?

Well, I thought so.

At present this only really works correctly for LLVM, but I'll extend
it in subsequent patches to work for subprojects.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D109019

* [mlir][linalg] Extend tiled_loop to SCF conversion to generate scf.parallel.

Differential Revision: https://reviews.llvm.org/D109230

* [RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0.

This patch changes the register class to avoid accidentally setting
the AVL operand to X0 through MachineIR optimizations.

There are cases where we really want to use X0, but we can't get that
past the MachineVerifier with the register class as GPRNoX0. So I've
use a 64-bit -1 as a sentinel for X0. All other immediate values should
be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI
insertion pass to avoid touching the rest of the algorithm. In
SelectionDAG lowering I'm using a -1 TargetConstant to hide it from
instruction selection and treat it differently than if the user
used -1. A user -1 should be selected to a register since it doesn't
fit in uimm5.

This is the rest of the changes started in D109110. As mentioned there,
I don't have a failing test from MachineIR optimizations anymore.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D109116

* [lld/mac] Don't assert during thunk insertion if there are undefined symbols

We end up calling resolveBranchVA(), which asserts for Undefineds.

As fix, just return early in Writer::run() if there are any diagnostics
after processing relocations (which is where undefined symbol errors are
emitted). This matches what the ELF port does.

Differential Revision: https://reviews.llvm.org/D109079

* Add missing `REQUIRES: asserts` to combine-icmp-to-lhs-known-bits.mir

* [ARM] Add VFP lowering for fptosi.sat

This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat
and llvm.fptoui.sat to VCVT instructions that inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107866

* [libc++][NFC] Remove uses of 'using namespace std;' in the test suite

Differential Revision: https://reviews.llvm.org/D109120

* Revert "[analyzer][NFCI] Allow clients of NoStateChangeFuncVisitor to check entire function calls, rather than each ExplodedNode in it"

This reverts commit a375bfb5b729e0f3ca8d5e001f423fa89e74de87.

This was causing a bot to crash:

https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/23380/

* [lldb/Plugins] Introduce Scripted Interface Factory

This patch splits the previous `ScriptedProcessPythonInterface` into
multiple specific classes:

1. The `ScriptedInterface` abstract class that carries the interface
   instance object and its virtual pure abstract creation method.

2. The `ScriptedPythonInterface` that holds a generic `Dispatch` method that
   can be used by various interfaces to call python methods and also keeps a
   reference to the Python Script Interpreter instance.

3. The `ScriptedProcessInterface` that describes the base Scripted
   Process model with all the methods used in the underlying script.

All these components are used to refactor the `ScriptedProcessPythonInterface`
class, making it more modular.

This patch is also a requirement for the upcoming work on `ScriptedThread`.

Differential Revision: https://reviews.llvm.org/D107521

Signed-off-by: Med Ismail Bennani <[email protected]>

* [gn build] Port b9e57e030560

* [NFC][CSSPGO] Add end of file newline to test input

On some platform (eg: AIX), diff will complain about newline.

diff: Missing newline at the end of file
.../llvm/test/tools/llvm-profdata/Inputs/cs-sample.proftext.

* [flang] Move runtime API headers to flang/include/flang/Runtime

Move the closure of the subset of flang/runtime/*.h header files that
are referenced by source files outside flang/runtime (apart from unit tests)
into a new directory (flang/include/flang/Runtime) so that relative
include paths into ../runtime need not be used.

flang/runtime/pgmath.h.inc is moved to flang/include/flang/Evaluate;
it's not used by the runtime.

Differential Revision: https://reviews.llvm.org/D109107

* [modules] Use `HashBuilder` and `MD5` for the module hash.

Per the comments, `hash_code` values "are not stable to save or
persist", so are unsuitable for the module hash, which must persist
across compilations for the implicit module hashes to match. Note that
in practice, today, `hash_code` are stable. But this is an
implementation detail, with a clear `FIXME` indicating we should switch
to a per-execution seed.

The stability of `MD5` also allows modules cross-compilation use-cases.
The `size_t` underlying storage for `hash_code` varying across platforms
could cause mismatching hashes when cross-compiling from a 64bit
target to a 32bit target.

Note that native endianness is still used for the hash computation. So hashes
will differ between platforms of different endianness.

Reviewed By: jansvoboda11

Differential Revision: https://reviews.llvm.org/D102943

* [NFC][DWARF] Add triple to new TAG test file

The file is requiring x86, but using llc without triple.

This will cause problem on non-x86 platforms, as the default triple will
not be x86.

eg: On PowerPC le, it will emit warnings as:

'x86-64' is not a recognized processor for this target (ignoring
processor)
'+cx8' is not a recognized feature for this target (ignoring feature)
'+fxsr' is not a recognized feature for this target (ignoring feature)
'+mmx' is not a recognized feature for this target (ignoring feature)
'+sse' is not a recognized feature for this target (ignoring feature)
..

On some other platform, it may even crash -- if some of the feature are
with same name (eg: soft-float).

Add the triple as this was the intention test target.

* [gn build] Reformat all files

Ran `git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format`.

* [ARM] Add patterns for store(fptosisat(..))

As an extension to D107866, this adds store(fptosisat(..)) patterns,
similar to the existing fptosi patterns, to prevent unnecessarily moving
into gpr regs where we can use fp stores directly.

Differential Revision: https://reviews.llvm.org/D108378

* [libc++abi] Remove workarounds for missing -Wno-exceptions on older GCCs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97675 has now been resolved
in GCC 11, so we can remove those workarounds.

Differential Revision: https://reviews.llvm.org/D109188

* [libc++] Remove _LIBCPP_HAS_NO_LONG_LONG in favour of using_if_exists

_LIBCPP_HAS_NO_LONG_LONG was only defined on FreeBSD. Instead, use the
using_if_exists attribute to skip over declarations that are not available
on the base system. Note that there's an annoying limitation that we can't
conditionally define a function based on whether the base system provides
a function, so for example we still need preprocessor logic to define the
abs() and div() overloads.

Differential Revision: https://reviews.llvm.org/D108630

* [AMDGPU] Small cleanup in optimizeCompareInstr. NFC.

* [clang] fix error recovery ICE on copy elision when returing invalid variable

See PR51708.

Attempting copy elision in dependent contexts with invalid variable,
such as a variable with incomplete type, would cause a crash when attempting
to calculate it's alignment.

The fix is to just skip this optimization on invalid VarDecl, as otherwise this
provides no benefit to error recovery: This functionality does not try to
diagnose anything, it only calculates a flag which will affect where the
variable will be allocated during codegen.

Signed-off-by: Matheus Izvekov <[email protected]>

Reviewed By: rtrieu

Differential Revision: https://reviews.llvm.org/D109191

* [compiler-rt][Profile] Wait for child threads in set-file-object test

We've been seeing this test return 31 instead of 32 for the "functions"
line in this test on our AArch64 bots.

One possible cause is some of the children not finishing in time
before the llvm-profdata commands are run, if the machine is heavily loaded.

Wait for all the children to finish before exiting the parent.

Reviewed By: zequanwu

Differential Revision: https://reviews.llvm.org/D109222

* [InstCombine] add tests for icmp of rotate (PR51566); NFC

* [InstCombine] reduce code duplication; NFC

* [InstCombine] fold (rotate X) eq/ne (0/-1)

This generalizes the examples shown in:
https://llvm.org/PR51566

https://alive2.llvm.org/ce/z/V-sEy9

* [libc++][NFC] Mark values in gdb pretty print comparison functions as live to prevent values being optimized out.

It appears when testing LLVM 13 on Power, we run into failures with the
`libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp` test case optimizing
values out.

Despite some the functions in the test already being marked with optnone,
adding the `MarkAsLive()` calls inside of the pretty printer comparison functions
resolves the issues of the values being optimized out.

This patch aims to address https://llvm.org/PR51675.

Differential Revision: https://reviews.llvm.org/D109204

* [SampleFDO] Fix -Wnon-virtual-dtor

Make the dtor virtual to fix the warning.

* DebugInfo: Correct/improve type formatting (pointers to function types especially)

This does add some extra superfluous whitespace (eg: "int *") intended
to make the Simplified Template Names work easier - this makes the
DIE-based names match more exactly the clang-generated names, so it's
easier to identify cases that don't generate matching names.

(arguably we could change clang to skip that whitespace or add some
fuzzy matching to accommodate differences in certain whitespace - but
this seemed easier and fairly low-impact)

* Revert "[Coroutines] [Clang] Look up coroutine component in std namespace first"

This reverts commit 2fbd254aa46b, which broke the libc++ CI. I'm reverting
to get things stable again until we've figured out a way forward.

Differential Revision: https://reviews.llvm.org/D108696

* [libc++] Add an assertion in the subrange constructors with a size hint

Those constructors are very easy to misuse -- one could easily think that
the size passed to the constructor is the size of the range to exhibit
from the subrange. Instead, it's a size hint and it's UB to get it wrong.
Hence, when it's cheap to compute the real size of the range, it's cheap
to make sure that the user didn't get it wrong.

Differential Revision: https://reviews.llvm.org/D108827

* [lldb] Adjust parse_frames for unnamed images

Follow up to 2cbd3b04feaaaff7fab4c6500476839a23180886 which added
support for unnamed images but missed the use case in parse_frames.

* [NFC][OpenMP] Use clang_cc1 to driver tests

The test driver-fopenmp-extensions.c is failing on platforms that does
not use integrated-as. It can be reproduced using -fno-integrated-as on
Linux too.

bin/clang -c -Xclang -verify=omp -fopenmp      -fopenmp-extensions
-fno-openmp-extensions
../llvm-project/clang/test/OpenMP/driver-fopenmp-extensions.c
-fno-integrated-as
Assembler messages:
Error: can't open /tmp/driver-fopenmp-extensions-8fafe8.s for reading:
No such file or directory
clang-14: error: assembler command failed with exit code 1 (use -v to
see invocation)

The goal of this test is to verify syntax diags only,
so we should use clang_cc1 to test.

Reviewed By: jdenny, ABataev

Differential Revision: https://reviews.llvm.org/D109255

* [mlir][sparse] add convenience method for sparse tensor setup

This simplifies setting up sparse tensors through C-style data structures.
Useful for runtimes that want to interact with MLIR-generated code
without knowning about all bufferization details (viz. memrefs).

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D109251

* [libc] fix strtointeger hex prefix parsing

Fix edge case where "0x" would be considered a complete hexadecimal
number for purposes of str_end. Now the hexadecimal prefix needs a valid
digit after it, else just the 0 will be counted as the number.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D109084

* [flang] Use CMake to determine endianness.

The preprocessor definitions __BYTE_ORDER__, __ORDER_BIG_ENDIAN__, and
__ORDER_LITTLE_ENDIAN__ are gcc extensions (also supported by clang),
but msvc (and others) do not define them. As a result __BYTE_ORDER__
and __ORDER_BIG_ENDIAN__ both evaluate to 0 by the prepreprocessor,
and __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__, the first `#if` condition
to 1, hence assuming the wrong byte order for x86(_64).

This patch instead uses CMake's TestBigEndian module to determine
target architecture's endianness at configure-time.

Note this also uses the same mechanism for the runtime. If compiling
flang as a cross-compiler, the runtime for the compile-target must be
built separately (Flang does not support the LLVM_ENABLE_RUNTIMES
mechanism yet).

Fixes llvm.org/PR51597

Reviewed By: ijan1, Leporacanthicus

Differential Revision: https://reviews.llvm.org/D109108

* DebugInfo: Fix a few bot failures for type dumping fixes

* [clang] Allow the OpenBSD driver to link the libclang_rt.profile library.

Differential Revision: https://reviews.llvm.org/D109244

* Make LLVM Linkage a first class attribute instead of using an integer attribute

This makes the IR more readable, in particular when this will be used on
the builtin func outside of the LLVM dialect.

Reviewed By: wsmoses

Differential Revision: https://reviews.llvm.org/D109209

* OpenBSD also needs execinfo

* [lldb/Plugins] Move member template specialization out of class

This patch should fix the build failure that surfaced when build llvm
with GCC: https://lab.llvm.org/staging/#/builders/16/builds/10450

GCC complained that I explicitely specialized
 `ScriptedPythonInterface::ExtractValueFromPythonObject` in a
in non-namespace scope, which is tolerated by Clang.

To solve this issue, the specialization were declared out of the class
and implemented in the source file.

Signed-off-by: Med Ismail Bennani <[email protected]>

* DebugInfo: additional fix missed in bc066e2.

* [ORC] Silence a buggy GCC unused argument warning.

* [AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)

Prevent the folding if it leads to worse code.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108871

* Support linking against OpenMP runtime on OpenBSD.

* [MLIR] Primitive linkage lowering of FuncOp

FuncOp always lowers to an LLVM external linkage presently. This makes it …
hisenyiu2015 pushed a commit to hisenyiu2015/msm-4.14 that referenced this issue Sep 23, 2021
commit fad7cd3 ("nbd: add the check to prevent overflow in __nbd_ioctl()")
exposed something that's long been broken for semi-hosted environments
like the kernel in Clang:

check_mul_overflow() is implemented in terms of
__builtin_mul_overflow(). For 64b operands on 32b hosts, LLVM was
emitting libcalls to __mulodi4() which assumes that compiler-rt is being
linked against. The kernel does not do so, so LLVM was emitting calls to
functions that have no definition, resulting in the linkage failure:

ERROR: modpost: "__mulodi4" [drivers/block/nbd.ko] undefined!

I have been fixing LLVM upstream, see the six fixes linked from:
https://bugs.llvm.org/show_bug.cgi?id=28629#c23.

I still need to detect older toolchains that we'd still like to support,
then find an appropriate workaround for the kernel.

Disable network block devices for now, so that we don't lose coverage of
32b ARM allmodconfig builds which are currently red in our CI.

Bug: 199191028
Link: ClangBuiltLinux/linux#1438
Signed-off-by: Nick Desaulniers <[email protected]>
Change-Id: I79a597177f75370f60621b984cb8e21ca2a268d6
lantianyu pushed a commit to lantianyu/linux that referenced this issue Sep 29, 2021
commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

Also, because the macro is type agnostic, it is very difficult to write
a similarly type generic macro that dispatches to one of:
 * div64_s64
 * div64_u64
 * div_s64
 * div_u64

Raising the minimum supported versions allows us to remove all of the
fallback helpers for !COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW, instead
dispatching the compiler builtins.

arm64 has already raised the minimum supported GCC version to 5.1, do
this for all targets now.  See the link below for the previous
discussion.

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/lkml/CAK7LNASs6dvU6D3jL2GG3jW58fXfaj6VNOe55NJnTB8UPuk2pA@mail.gmail.com/
Link: ClangBuiltLinux/linux#1438
Reported-by: Stephen Rothwell <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Suggested-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
lantianyu pushed a commit to lantianyu/linux that referenced this issue Sep 29, 2021
Once upgrading the minimum supported version of GCC to 5.1, we can drop
the fallback code for !COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.

This is effectively a revert of commit f090782 ("compiler.h: enable
builtin overflow checkers and add fallback code")

Link: ClangBuiltLinux/linux#1438 (comment)
Suggested-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
@nickdesaulniers
Copy link
Member

https://lore.kernel.org/lkml/[email protected]/is the latest kernel patch in this saga.

ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Sep 30, 2021
commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux/linux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
@nickdesaulniers
Copy link
Member

accepted: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.15&id=41e76c6a3c83c85e849f10754b8632ea763d9be4

(once this lands in mainline, I'll need to chase this into stable, then re-enable CONFIG_BLK_DEV_NBD=y in ACK)

@nickdesaulniers nickdesaulniers added [PATCH] Accepted A submitted patch has been accepted upstream Needs Backport Should be backported to either linux-stable tree or latest llvm release branch. labels Sep 30, 2021
@nickdesaulniers
Copy link
Member

41e76c6 is in v5.15-rc4. GKH pulled this into stable 5.14.

@nickdesaulniers nickdesaulniers added [FIXED][LINUX] 5.15 This bug was fixed in Linux 5.15 and removed Needs Backport Should be backported to either linux-stable tree or latest llvm release branch. [PATCH] Accepted A submitted patch has been accepted upstream labels Oct 4, 2021
nareshkamboju pushed a commit to nareshkamboju/linux that referenced this issue Oct 5, 2021
commit 41e76c6 upstream.

commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nareshkamboju pushed a commit to nareshkamboju/linux that referenced this issue Oct 5, 2021
commit 41e76c6 upstream.

commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
nareshkamboju pushed a commit to nareshkamboju/linux that referenced this issue Oct 6, 2021
commit 41e76c6 upstream.

commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Oct 7, 2021
commit 41e76c6 upstream.

commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux/linux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
pundiramit pushed a commit to pundiramit/linux that referenced this issue Nov 7, 2021
This reverts commit ca7dad5.

This re-enabled coverage of BLK_DEV_NBD. The clang-13 bug was worked
around in
commit 41e76c6 ("nbd: use shifts rather than multiplies")

Bug: 199191028
Link: ClangBuiltLinux/linux#1438
Signed-off-by: Nick Desaulniers <[email protected]>
Change-Id: Ifcb6131e5c8b12e4c784320adac639b573198b1f
vinzv pushed a commit to tuxedocomputers/linux that referenced this issue Dec 1, 2021
BugLink: https://bugs.launchpad.net/bugs/1950516

commit 41e76c6 upstream.

commit fad7cd3 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit f090782 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit 76ae847 ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Link: ClangBuiltLinux/linux#1438
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <[email protected]>
Reported-by: Nathan Chancellor <[email protected]>
Reported-by: Stephen Rothwell <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Suggested-by: Pavel Machek <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
__has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support allmodconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108842
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
__has_builtin(__builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108844
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D108936
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Similar to D108842 and D108844.

__has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks MIPS malta_defconfig builds of the Linux kernel that are
using __builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: ClangBuiltLinux/linux#1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108926
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b x86 targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ARCH=i386 + CONFIG_BLK_DEV_NBD=y builds of the Linux kernel
that are using builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://bugs.llvm.org/show_bug.cgi?id=35922
Link: ClangBuiltLinux/linux#1438

Reviewed By: lebedev.ri, RKSimon

Differential Revision: https://reviews.llvm.org/D108928
RahifM pushed a commit to RahifM/android_kernel_xiaomi_sdm845 that referenced this issue Jun 5, 2023
Once upgrading the minimum supported version of GCC to 5.1, we can drop
the fallback code for !COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.

This is effectively a revert of commit f0907827a8a9 ("compiler.h: enable
builtin overflow checkers and add fallback code")

Link: ClangBuiltLinux/linux#1438 (comment)
Suggested-by: Rasmus Villemoes <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

 Conflicts:
	include/linux/compiler-clang.h
	include/linux/compiler-gcc.h
	tools/include/linux/compiler-gcc.h
	tools/include/linux/overflow.h
nathanchance added a commit to nathanchance/continuous-integration2 that referenced this issue Jan 22, 2025
After commit 9aa0ebde0014 ("bpf, verifier: Improve precision of
BPF_MUL") [1], there is an error in certain ARM configurations that
enable CONFIG_BPF_SYSCALL:

  ld.lld: error: undefined symbol: __mulodi4
  >>> referenced by verifier.c:14221 (/builds/linux/kernel/bpf/verifier.c:14221)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced by verifier.c:14222 (/builds/linux/kernel/bpf/verifier.c:14222)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced by verifier.c:14223 (/builds/linux/kernel/bpf/verifier.c:14223)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced 1 more times

This was encountered previously [2], where it was fixed in clang-14 and
avoided in the kernel with a source code workaround (that ended up being
cleaner anyways).

This time around, inserting a source code workaround would not be as
clean, as it may involve disabling a core part of the kernel on a
limited condition (as it only impacts one supported LLVM version and
architecture combination) or having a separate code path for this
situation.

For now, just disable the builds that are impacted by this. If more
people notice this problem, we can explore bumping the minimum supported
version of LLVM for building `ARCH=arm` to 14.

Link: https://git.kernel.org/bpf/bpf-next/c/9aa0ebde0014f01a8ca82adcbf43b92345da0d50 [1]
Link: ClangBuiltLinux/linux#1438 [2]
Signed-off-by: Nathan Chancellor <[email protected]>
nathanchance added a commit to nathanchance/continuous-integration2 that referenced this issue Mar 5, 2025
… with clang-13

After commit 9aa0ebde0014 ("bpf, verifier: Improve precision of
BPF_MUL") [1], there is an error in certain ARM configurations that
enable CONFIG_BPF_SYSCALL:

  ld.lld: error: undefined symbol: __mulodi4
  >>> referenced by verifier.c:14221 (/builds/linux/kernel/bpf/verifier.c:14221)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced by verifier.c:14222 (/builds/linux/kernel/bpf/verifier.c:14222)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced by verifier.c:14223 (/builds/linux/kernel/bpf/verifier.c:14223)
  >>>               kernel/bpf/verifier.o:(adjust_reg_min_max_vals) in archive vmlinux.a
  >>> referenced 1 more times

This was encountered previously [2], where it was fixed in clang-14 and
avoided in the kernel with a source code workaround (that ended up being
cleaner anyways).

This time around, inserting a source code workaround would not be as
clean, as it may involve disabling a core part of the kernel on a
limited condition (as it only impacts one supported LLVM version and
architecture combination) or having a separate code path for this
situation.

For now, just disable the builds that are impacted by this. If more
people notice this problem, we can explore bumping the minimum supported
version of LLVM for building `ARCH=arm` to 14.

Link: https://git.kernel.org/bpf/bpf-next/c/9aa0ebde0014f01a8ca82adcbf43b92345da0d50 [1]
Link: ClangBuiltLinux/linux#1438 [2]
Signed-off-by: Nathan Chancellor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[BUG] linux-next This is an issue only seen in linux-next [BUG] llvm A bug that should be fixed in upstream LLVM [FIXED][LINUX] 5.15 This bug was fixed in Linux 5.15 [FIXED][LLVM] 14 This bug was fixed in LLVM 14.x Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list.
Projects
None yet
Development

No branches or pull requests

7 participants