Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x #4479

irishrover · 2025-03-06T18:09:52Z

When analyzing Chromium or any other large project, __contains_no_intrinsic_headers is called thousands of times for the same dir names. It is an expensive OS call.

irishrover · 2025-03-06T18:27:38Z

According to line_profiler (https://github.com/pyutils/line_profiler)

BEFORE

0.00 seconds - codechecker_analyzer\buildlog\log_parser.py:645 - determine_compiler
0.00 seconds - codechecker_analyzer\buildlog\log_parser.py:629 - get_language
0.06 seconds - codechecker_analyzer\buildlog\log_parser.py:731 - os_path_normpath
0.10 seconds - codechecker_analyzer\buildlog\log_parser.py:1184 - extend_compilation_database_entries
0.14 seconds - codechecker_analyzer\buildlog\log_parser.py:680 - __is_not_include_fixed
0.16 seconds - codechecker_analyzer\buildlog\log_parser.py:886 - __get_output
0.16 seconds - codechecker_analyzer\buildlog\log_parser.py:832 - __get_arch
0.17 seconds - codechecker_analyzer\buildlog\log_parser.py:287 - filter_compiler_includes_extra_args
0.17 seconds - codechecker_analyzer\buildlog\log_parser.py:850 - __get_target
0.18 seconds - codechecker_analyzer\buildlog\log_parser.py:868 - __get_language
0.19 seconds - codechecker_analyzer\buildlog\log_parser.py:702 - __collect_clang_compile_opts
0.25 seconds - codechecker_analyzer\buildlog\log_parser.py:798 - __skip_sources
0.26 seconds - codechecker_analyzer\buildlog\log_parser.py:915 - __skip_clang
0.27 seconds - codechecker_analyzer\buildlog\log_parser.py:712 - __collect_transform_xclang_opts
0.47 seconds - codechecker_analyzer\buildlog\log_parser.py:810 - __determine_action_type
0.96 seconds - codechecker_analyzer\buildlog\log_parser.py:736 - __collect_transform_include_opts
40.85 seconds - codechecker_analyzer\buildlog\log_parser.py:690 - __contains_no_intrinsic_headers <<<----- non-cached
81.00 seconds - codechecker_analyzer\buildlog\log_parser.py:945 - parse_options
82.18 seconds - codechecker_analyzer\buildlog\log_parser.py:1250 - parse_unique_log

AFTER

0.00 seconds - codechecker_analyzer\buildlog\log_parser.py:645 - determine_compiler
0.00 seconds - codechecker_analyzer\buildlog\log_parser.py:629 - get_language
0.06 seconds - codechecker_analyzer\buildlog\log_parser.py:689 - __contains_no_intrinsic_headers <<<----- cached
0.06 seconds - codechecker_analyzer\buildlog\log_parser.py:731 - os_path_normpath
0.11 seconds - codechecker_analyzer\buildlog\log_parser.py:1184 - extend_compilation_database_entries
0.15 seconds - codechecker_analyzer\buildlog\log_parser.py:680 - __is_not_include_fixed
0.18 seconds - codechecker_analyzer\buildlog\log_parser.py:832 - __get_arch
0.18 seconds - codechecker_analyzer\buildlog\log_parser.py:886 - __get_output
0.18 seconds - codechecker_analyzer\buildlog\log_parser.py:287 - filter_compiler_includes_extra_args
0.18 seconds - codechecker_analyzer\buildlog\log_parser.py:850 - __get_target
0.20 seconds - codechecker_analyzer\buildlog\log_parser.py:868 - __get_language
0.20 seconds - codechecker_analyzer\buildlog\log_parser.py:702 - __collect_clang_compile_opts
0.27 seconds - codechecker_analyzer\buildlog\log_parser.py:798 - __skip_sources
0.27 seconds - codechecker_analyzer\buildlog\log_parser.py:915 - __skip_clang
0.30 seconds - codechecker_analyzer\buildlog\log_parser.py:712 - __collect_transform_xclang_opts
0.51 seconds - codechecker_analyzer\buildlog\log_parser.py:810 - __determine_action_type
0.99 seconds - codechecker_analyzer\buildlog\log_parser.py:736 - __collect_transform_include_opts
43.43 seconds - codechecker_analyzer\buildlog\log_parser.py:945 - parse_options
44.61 seconds - codechecker_analyzer\buildlog\log_parser.py:1250 - parse_unique_log

When analyzing Chromium or any other large project, __contains_no_intrinsic_headers is called thousands of times for the same dir name. It is an expensive OS call.

irishrover · 2025-03-08T16:44:03Z

Looks only GUI tests fail and it's not related to this PR.

irishrover · 2025-03-10T12:37:20Z

@bruntib, @vodorok can you please have a look at this PR?

bruntib

Thanks, caching this slow function is a great improvement. I just have two small questions.

bruntib · 2025-03-11T14:15:36Z

analyzer/codechecker_analyzer/buildlog/log_parser.py

@@ -1289,6 +1288,7 @@ def parse_unique_log(compilation_database,
            uniqueing_re = re.compile(compile_uniqueing)

        skipped_cmp_cmd_count = 0
+        __contains_no_intrinsic_headers.cache_clear()


What is the purpose of clearing the cache in the beginning?

In fact it's only used for tests as they share a static cache storage.
In real scenarios this function is called only once so clearing the cache is nop.

bruntib · 2025-03-11T14:36:26Z

analyzer/codechecker_analyzer/buildlog/log_parser.py

@@ -1184,12 +1185,10 @@ def extend_compilation_database_entries(compilation_database):
                for source_file in source_files:
                    new_entry = dict(entry)
                    new_entry['file'] = source_file
-                    entries.append(new_entry)


Why is it useful to build this list?

Well, it's not directly related to the speed-up, but I noticed that the whole entries is never used at the same time and decided to save a bit of memory here.

Oh, sorry, for some reason I though it was a "yield -> append" change, but it's just reverse :D Sorry.

irishrover requested review from bruntib and vodorok as code owners March 6, 2025 18:09

irishrover force-pushed the speedup-parse-options branch from 0eb3f00 to c3017f4 Compare March 6, 2025 18:22

irishrover force-pushed the speedup-parse-options branch from c3017f4 to 1fe5d4a Compare March 6, 2025 19:12

Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x

fbed9eb

When analyzing Chromium or any other large project, __contains_no_intrinsic_headers is called thousands of times for the same dir name. It is an expensive OS call.

irishrover force-pushed the speedup-parse-options branch from 1fe5d4a to fbed9eb Compare March 8, 2025 16:28

bruntib requested changes Mar 11, 2025

View reviewed changes

bruntib approved these changes Mar 11, 2025

View reviewed changes

bruntib merged commit 22c5e34 into Ericsson:master Mar 11, 2025
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x #4479

Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x #4479

irishrover commented Mar 6, 2025

irishrover commented Mar 6, 2025 •

edited

Loading

irishrover commented Mar 8, 2025

irishrover commented Mar 10, 2025

bruntib left a comment

bruntib Mar 11, 2025

irishrover Mar 11, 2025

bruntib Mar 11, 2025

irishrover Mar 11, 2025

bruntib Mar 11, 2025

Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x #4479

Cache __contains_no_intrinsic_headers and thus speedup parse_options ~2x #4479

Conversation

irishrover commented Mar 6, 2025

irishrover commented Mar 6, 2025 • edited Loading

irishrover commented Mar 8, 2025

irishrover commented Mar 10, 2025

bruntib left a comment

Choose a reason for hiding this comment

bruntib Mar 11, 2025

Choose a reason for hiding this comment

irishrover Mar 11, 2025

Choose a reason for hiding this comment

bruntib Mar 11, 2025

Choose a reason for hiding this comment

irishrover Mar 11, 2025

Choose a reason for hiding this comment

bruntib Mar 11, 2025

Choose a reason for hiding this comment

irishrover commented Mar 6, 2025 •

edited

Loading