excessive compile time building linux/lib/zstd/compress.c #1314

arndb · 2021-02-28T20:47:01Z

I did a scripted search for files with long compile time. lib/zstd/compress.c is one of the files that came out on top.

$ perf stat clang-11 -c lib/zstd/compress.i -O3 -Wall -Wno-unused  -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE   --target=aarch64-linux -Werror=unknown-warning-option -mgeneral-regs-only -Wno-psabi -fno-asynchronous-unwind-tables -fno-unwind-tables -mbranch-protection=pac-ret+leaf  -fno-delete-null-pointer-checks -Wframe-larger-than=2048 -fno-stack-protector -mno-global-merge -fno-omit-frame-pointer -fno-optimize-sibling-calls -mno-omit-leaf-frame-pointer -pg -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types   -fsanitize=kernel-hwaddress -mllvm -hwasan-instrument-stack=0 -mllvm -hwasan-use-short-granules=0 -mllvm -hwasan-mapping-offset=0xefffc00000000000  -fsanitize=array-bounds -fsanitize=unreachable -fsanitize=object-size -fsanitize=bool -fsanitize=enum
          26925.09 msec task-clock                #    0.998 CPUs utilized          
              2738      context-switches          #    0.102 K/sec                  
                22      cpu-migrations            #    0.001 K/sec                  
             50399      page-faults               #    0.002 M/sec                  
       87646286353      cycles                    #    3.255 GHz                      (83.32%)
       10257322275      stalled-cycles-frontend   #   11.70% frontend cycles idle     (83.31%)
        5991572775      stalled-cycles-backend    #    6.84% backend cycles idle      (83.34%)
       59088749914      instructions              #    0.67  insn per cycle         
                                                  #    0.17  stalled cycles per insn  (83.35%)
       11508007401      branches                  #  427.408 M/sec                    (83.35%)
         491383656      branch-misses             #    4.27% of all branches          (83.33%)

      26.972371393 seconds time elapsed

      26.757734000 seconds user
       0.159723000 seconds sys

Attaching the preprocessed version:
compress.i.gz

The profile shows these functions on top:

   3.20%  clang-13  libLLVM-13.so.1      [.] llvm::BasicBlock::getTerminator
   2.95%  clang-13  libLLVM-13.so.1      [.] llvm::LiveRangeCalc::findReachingDefs
   2.46%  clang-13  libLLVM-13.so.1      [.] llvm::LoopBase<llvm::MachineBasicBlock, llvm::MachineLoop>::getExitingBlocks
   2.00%  clang-13  libLLVM-13.so.1      [.] llvm::SmallPtrSetImplBase::FindBucketFor
   1.89%  clang-13  libc-2.33.so         [.] _int_malloc
   1.31%  clang-13  libLLVM-13.so.1      [.] llvm::SmallPtrSetImplBase::insert_imp_big
   1.25%  clang-13  libLLVM-13.so.1      [.] llvm::LiveRange::extendInBlock
   1.18%  clang-13  libLLVM-13.so.1      [.] llvm::SelectionDAG::Combine
   1.18%  clang-13  libLLVM-13.so.1      [.] llvm::ScalarEvolution::forgetLoop
   0.83%  clang-13  libLLVM-13.so.1      [.] llvm::SpillPlacement::activate
   0.73%  clang-13  libc-2.33.so         [.] _int_free
   0.70%  clang-13  libLLVM-13.so.1      [.] llvm::LiveRangeUpdater::add
   0.69%  clang-13  libc-2.33.so         [.] msort_with_tmp.part.0
   0.68%  clang-13  libLLVM-13.so.1      [.] llvm::DominatorTreeBase<llvm::BasicBlock, false>::dominates
   0.66%  clang-13  libLLVM-13.so.1      [.] llvm::LiveIntervals::extendSegmentsToUses
   0.65%  clang-13  libc-2.33.so         [.] malloc
   0.64%  clang-13  libLLVM-13.so.1      [.] llvm::LiveVariables::MarkVirtRegAliveInBlock
   0.62%  clang-13  libLLVM-13.so.1      [.] llvm::ValueHandleBase::AddToUseList
   0.62%  clang-13  libc-2.33.so         [.] unlink_chunk.constprop.0
   0.60%  clang-13  libLLVM-13.so.1      [.] llvm::SpillPlacement::addLinks
   0.60%  clang-13  libc-2.33.so         [.] cfree@GLIBC_2.2.5
   0.58%  clang-13  libc-2.33.so         [.] malloc_consolidate
   0.56%  clang-13  libLLVM-13.so.1      [.] llvm::Value::getMetadata
   0.54%  clang-13  libLLVM-13.so.1      [.] llvm::SmallPtrSetImplBase::Grow
   0.52%  clang-13  libLLVM-13.so.1      [.] llvm::Instruction::getSuccessor
   0.51%  clang-13  libLLVM-13.so.1      [.] 0x0000000001029ad0
   0.50%  clang-13  libLLVM-13.so.1      [.] 0x0000000000e5eb1b

The text was updated successfully, but these errors were encountered:

arndb · 2021-02-28T21:50:36Z

I played around with this a little and found that -fsanitize=kernel-hwaddress is the main reason for the slow compilation, without this, compile time goes down to 12 seconds instead of 27.

nickdesaulniers · 2021-03-01T00:59:55Z

IIRC, doesn't this file have __always_inline on every function? I remember there was one zstd related file that had a relatively large time spent in the optimizer than the frontend, unlike most other source files.

#1086 (comment) is when I looked at it, though it seems I failed to make such notes public...

nickdesaulniers · 2021-03-01T01:16:13Z

cc @eugenis for feedback on -fsanitize=kernel-hwaddress.

arndb · 2021-03-01T09:15:28Z

Indeed, removing every __always_inline in the preprocessed file makes compilation three to four times faster, both with and without -fsanitize=kernel-address.

With both kasan and always_inline removed, the file now compiles in three seconds, which is still rather slow, but is ten times faster than the maximum I found.

arndb · 2021-03-01T09:39:17Z

Comparing clang-13 against gcc-10:

clang-13, hwasan, always_inline  25.6s
clang-13, hwasan                  6.3s
clang-13, kasan, always_inline   23.4s
clang-13, kasan                   4.9s
clang-13, always_inline          11.2s
clang-13, plain                   3.4s
gcc-10.1, kasan, always_inline   13.4s
gcc-10.1, kasan                   7.1s
gcc-10.1, always_inline          11.7s
gcc-10.1, plain                   3.6s

arndb · 2021-03-01T20:42:17Z

This one seems related but even worse, at over 100 CPU seconds to compile the preprocessed file with KASAN:

$ perf stat clang-13 -Wno-unused --target=arm-linux-gnueabi  -O2  -fsanitize=kernel-address   -c qla_dbg-clang.i   -mllvm -asan-instrumentation-with-call-threshold=0  

 Performance counter stats for 'clang-13 -Wno-unused --target=arm-linux-gnueabi -O2 -fsanitize=kernel-address -c qla_dbg-clang.i -mllvm -asan-instrumentation-with-call-threshold=0':

         104607.14 msec task-clock                #    0.998 CPUs utilized          
             10588      context-switches          #    0.101 K/sec                  
                59      cpu-migrations            #    0.001 K/sec                  
             54679      page-faults               #    0.523 K/sec                  
      340985929529      cycles                    #    3.260 GHz                      (83.33%)
       67694323496      stalled-cycles-frontend   #   19.85% frontend cycles idle     (83.33%)
       18623444901      stalled-cycles-backend    #    5.46% backend cycles idle      (83.33%)
      151628665799      instructions              #    0.44  insn per cycle         
                                                  #    0.45  stalled cycles per insn  (83.34%)
       27060513254      branches                  #  258.687 M/sec                    (83.33%)
        1384301145      branch-misses             #    5.12% of all branches          (83.34%)

     104.785569716 seconds time elapsed

     104.325451000 seconds user
       0.251598000 seconds sys

Without KASAN, it takes 7.6s to compile.

With gcc, it's 4.6.s with kasan or 4.0s without kasan.

qla_dbg-clang.i.gz

arndb · 2021-03-01T22:07:50Z

I found that this one-line change reduces the build time from 104 seconds to 2.7 seconds:

--- a/drivers/scsi/qla2xxx/qla_dbg.c
+++ b/drivers/scsi/qla2xxx/qla_dbg.c
@@ -280,7 +280,7 @@ qla24xx_dump_memory(struct qla_hw_data *ha, __be32 *code_ram,
        return rval;
 }
 
-static __be32 *
+static noinline __be32 *
 qla24xx_read_window(struct device_reg_24xx __iomem *reg, uint32_t iobase,
                    uint32_t count, __be32 *buf)
 {

arndb · 2021-03-02T10:20:02Z

I ran the qla_dbg test case (without the noinline) through a "clang-13 -ftime-report" (full report attached), the interesting bits are

                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 96.9025 seconds (97.1371 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  95.4756 ( 98.6%)   0.0677 ( 57.6%)  95.5432 ( 98.6%)  95.7762 ( 98.6%)  DAG Combining 2
   0.4311 (  0.4%)   0.0109 (  9.2%)   0.4420 (  0.5%)   0.4426 (  0.5%)  Instruction Selection

                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 104.6836 seconds (104.9413 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  97.8214 ( 94.0%)   0.1408 ( 22.5%)  97.9622 ( 93.6%)  98.2003 ( 93.6%)  ARM Instruction Selection
   3.1954 (  3.1%)   0.0506 (  8.1%)   3.2459 (  3.1%)   3.2578 (  3.1%)  Greedy Register Allocator
   1.6001 (  1.5%)   0.3769 ( 60.3%)   1.9770 (  1.9%)   1.9829 (  1.9%)  ARM Assembly Printer

The results are the same for earlier versions of clang (tested 10 and up), so this is not a regression.
In fact, clang-10 and clang-11 are a few percent slower overall.

execution-timing.txt

arndb · 2021-03-07T22:40:34Z

Here is a reduced test case for qla_dbg.c, which comes down to the same function getting inline many times when probably should not be inlined.

qla_dbg-clang.c.gz

arndb · 2021-03-07T22:49:57Z

https://godbolt.org/z/9PGKo8

nickdesaulniers added the compile time Issues affecting compile times label Mar 1, 2021

kees mentioned this issue Oct 13, 2023

pathological build time with KASAN and inline functions llvm/llvm-project#69016

Open

kees added Reported upstream This bug was filed on LLVM’s issue tracker, Phabricator, or the kernel mailing list. [BUG] llvm A bug that should be fixed in upstream LLVM labels Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

excessive compile time building linux/lib/zstd/compress.c #1314

excessive compile time building linux/lib/zstd/compress.c #1314

arndb commented Feb 28, 2021

arndb commented Feb 28, 2021

Uh oh!

nickdesaulniers commented Mar 1, 2021 •

edited

Loading

Uh oh!

nickdesaulniers commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 2, 2021

Uh oh!

arndb commented Mar 7, 2021

Uh oh!

arndb commented Mar 7, 2021

Uh oh!

excessive compile time building linux/lib/zstd/compress.c #1314

excessive compile time building linux/lib/zstd/compress.c #1314

Comments

arndb commented Feb 28, 2021

arndb commented Feb 28, 2021

Uh oh!

nickdesaulniers commented Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nickdesaulniers commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 1, 2021

Uh oh!

arndb commented Mar 2, 2021

Uh oh!

arndb commented Mar 7, 2021

Uh oh!

arndb commented Mar 7, 2021

Uh oh!

nickdesaulniers commented Mar 1, 2021 •

edited

Loading