Add support for caching Rust compilations. Fixes #42 #77

luser · 2017-03-07T19:10:57Z

This is a pretty large set of changes, but it breaks down into a few parts:

First, I spent some time refactoring the existing logic from CompilerInfo::get_cached_or_compile, since it was all tangled with the logic of running the C preprocessor. I split that out into a separate method and moved a few other things around while I was there. This is the "Pre-Rust support refactoring, Part 1" change.
Next, I added some traits to hide implementation details of the actual compilation process: Compiler, CompilerHasher, and Compilation. I changed get_cached_or_compile and everywhere else that was interacting with compilers to only work with boxed trait objects of these types, and made it so that Compiler::parse_arguments returns a boxed CompilerHasher, and then CompilerHasher::generate_hash_key returns a boxed Compilation. This way the C compiler implementation and the Rust compiler implementation can each persist some private state through the process. I refactored the C compilation bits further into an implementation of these traits that's generic over a CCompilerImpl trait, implementations of which simply call into the existing gcc/clang/msvc functions. This is the "Pre-Rust support refactoring, Part 2" and "Part 3" changes.
Finally, I added a Rust implementation of the traits. A few refactoring-type changes snuck in to this changeset but I was too lazy to split them out at this point.

There are also a few tiny refactoring changesets mixed in there, I tried to split small, separable changes out when possible to make the whole thing easier to follow.

luser · 2017-03-08T13:17:02Z

The Travis builds are broken on 1.13 for some reason, I will take a look at that.

alexcrichton · 2017-03-08T15:44:57Z

src/compiler/compiler.rs

+    Box::new(detect.and_then(move |compiler| {
+        match compiler {
+            Some(compiler) => {
+                Box::new(sha1_digest(executable.clone(), &pool)


This is actually somewhat interesting in the rustc case. I'd imagine that quite commonly this is actually the rustup version of rustc.exe rather than the rustc version of rustc.exe. I wonder if we could automatically detect rustup and go fetch the actual compiler instead?

I believe that would work best by looking for rustup.exe in the directory next to rustc.exe (if found). If that's there we can run rustup which rustc which should print out the full path to the actual compiler.

Also as a side note here, I wonder if we should handle the dynamic libraries that rustc itself links to? I can't really imagine a use case though where the dylibs are changed and the rustc executable wouldn't change, though, so it's perhaps not an issue.

That's a really good point and I can't believe I didn't realize it! I don't have a non-rustup install handy, but would it be enough to run rustc --print sysroot and hash $sysroot/bin/rustc? I'd have to sanity check how that works with the tooltool-downloaded rustc we use in Firefox CI.

We discussed the dylib issue in Hawaii and we kind of hand-waved it away. I definitely don't want to be parsing the output of ldd or reading the binary headers to figure this out, but maybe it's enough to hash rustc + everything in $sysroot/lib/? Actually given that rustc is just a stub, maybe we skip hashing rustc entirely, hash $sysroot/lib and that's good enough?

but would it be enough to run rustc --print sysroot and hash $sysroot/bin/rustc

Oh right yeah that'd be perfect. (and also solve the rustup case)

This may also throw a wrench into sccache's own caching of the compiler as well unfortunately b/c rustup can have a different rustc depending on which directory it's invoked in (and that can change over time). I think sccache assumes the main compiler doesn't change much, right?

Also yeah I think we can handwave away b/c every release has different hashes in the dylib filenames which I believe implies that the contents of rustc will be different (e.g. the file somehow references those dylibs). That being said hashing $sysroot/lib would put a nail in the coffin and work perfectly as well.

The only catch there is that dylibs are in $sysroot/bin on Windows and $sysroot/lib everywhere else.

This may also throw a wrench into sccache's own caching of the compiler as well unfortunately b/c rustup can have a different rustc depending on which directory it's invoked in (and that can change over time). I think sccache assumes the main compiler doesn't change much, right?

Also a good point, we'll have to do something smarter here. sccache caches compiler info by (path, mtime), so rustup will totally break this expectation. I'll have to think about that for a bit, I don't want to have an extra rustc invocation for every single compile if we can avoid it.

alexcrichton · 2017-03-08T15:48:04Z

src/compiler/rust.rs

+                Err(e) => return Box::new(future::err(e)),
+            }
+        } else {
+            Box::new(future::err("Failed run rustc --dep-info".into()))


Could this contain the rustc errors logs (e.g. output above)?

That's a good idea. We return the preprocessor stderr for C compilation when it fails, we should do something smarter here. The error propagation in this code isn't great in general, I should make sure we're providing enough information that users can attempt to diagnose failures.

alexcrichton · 2017-03-08T15:49:01Z

src/compiler/rust.rs

+                    drop(temp_dir);
+                    hash_all(files, &pool)
+                }
+                Err(e) => return Box::new(future::err(e)),


Perhaps this could use chain_err to say "failed to parse dep info at {}"

alexcrichton · 2017-03-08T15:50:10Z

src/compiler/rust.rs

+            let outstr = String::from_utf8(output.stdout).chain_err(|| "Error parsing rustc output")?;
+            Ok(outstr.lines().map(|l| l.to_owned()).collect())
+        } else {
+            bail!("Failed to run `rustc --print file-names`");


I wonder if this may actually make a good helper as part of run_input_output, having a method to run a command and auto-check output.status.success() and bundle up all the relevant error info in the case of failure.

I'll have to look at the various places where I call this--I do think I wrote this pattern more than once.

alexcrichton · 2017-03-08T15:51:14Z

src/compiler/rust.rs

+            // so we can *find* them.
+            "-l" => return CompilerArguments::CannotCache,
+            v if v.starts_with("--emit") => {
+                //XXX: do we need to handle --emit specified more than once?


Ah yeah I think this does happen from time to time (I think I type it in at least every once in a while!)

I'm not overly concerned with caching all possible commandlines (clearly), but it'd be good to ensure that we're at least not producing incorrect behavior in that case.

Yeah I think it's fine to just return "not cacheable" for multiple --emit calls for now.

alexcrichton · 2017-03-08T15:56:06Z

src/compiler/rust.rs

+
+/// Calculate the SHA-1 digest of each file in `files` on background threads
+/// in `pool`.
+fn hash_all(files: Vec<String>, pool: &CpuPool) -> SFuture<Vec<String>>


Should these files be sorted to ensure that they're always hashed in the same order? I think the dep-info is relatively deterministic (but that's likely not guaranteed) and I wouldn't be surprised if the order of --extern from Cargo was also slightly nondeterministic

I sort the list of source files from the dep-info, but not the list of externs, they're input to the hash in the order they're listed on the commandline.

If cargo produces nondeterministic commandlines we've got a bigger problem, since we're feeding the entire commandline as hash input! Can we verify that assumption?

Ah yeah just confirmed, Cargo's order of --extern is nondeterministic.

If you'd prefer to change in Cargo I can also do that!

Fixing that in cargo would certainly make my life easier, and we're likely to need to wait for your dep-info speedup for this to all be useful anyway, so we can probably wait for a fix for that.

Is anything else about the commandline nondeterministic? If so, we'll have to fix sccache to not use the commandline as a direct hash input.

Ok I just checked, everything should be deterministic except the order of --extern which is nondeterministic.

alexcrichton · 2017-03-08T15:57:39Z

src/compiler/rust.rs

+                m.update(h.as_bytes());
+            }
+            // 6. TODO: Environment variables:
+            //    RUSTFLAGS, maybe CARGO_PKG_*?


RUSTFLAGS is safe to ignore as it's just used by Cargo, not rustc.

Other env vars though yeah like CARGO_* should probably be hashed. That being said I don't think the env from the child ships over to the server, right? (this is a case where that may actually be quite important to implement!)

In any case I think it's safe to assume that all of the child's CARGO_* env vars should be hashed here (although I guess arguably all env vars should be hashed due to env!)

I'm planning on fixing #48 immediately to remedy this.

Caching all of the environment variables seems like it's going to result in us having very few cache hits, unless cargo cleans the environment before passing it to rustc. Just looking at the output of env on my Linux machine shows a bunch of vars with PIDs and other junk in them.

I wonder if it'd be possible to teach rustc to tell us about the usage of env!, similar to emitting dep-info? If we could get a list of env! usage then we'd only have to hash those variables.

I wonder if it'd be possible to teach rustc to tell us about the usage of env!, similar to emitting dep-info

I was thinking the same thing yeah. I filed rust-lang/rust#40364 to track this.

I also definitely agree that there's tons of junk in env vars typically. We could try to maintain a blacklist of "don't worry about caching this" but otherwise I think blanket hashing CARGO_* should work fine for now.

alexcrichton · 2017-03-08T16:01:19Z

Overall looks great to me, nice!

I wonder if we should perhaps document somewhere the limitations of rustc caching? Off the top of my head what I can think of is:

Only some flag combinations work (e.g. --emit dep-info,link)
Values from env! may not be tracked in caching
Procedural macros that read files from the filesystem may not be cached correctly (yet)
The system linker isn't cached (e.g. changing gcc/ld won't invalidate)
Target specs aren't hashed (e.g. custom target specs I believe)

luser · 2017-03-08T16:38:30Z

We should definitely document those! I wrote down a bunch of things in the parse_arguments implementation as comments, but it'd be good to have them somewhere else.

The system linker isn't cached (e.g. changing gcc/ld won't invalidate)

This is a good point. Is there a way to ask rustc what linker it would use? If this isn't simple to fix I wonder if we shouldn't avoid caching dylib/bin crates for now.

Target specs aren't hashed (e.g. custom target specs I believe)

I'm not clear on what this means--do you mean that we don't hash the target when it's not passed on the commandline?

alexcrichton · 2017-03-08T17:08:42Z

This is a good point. Is there a way to ask rustc what linker it would use? If this isn't simple to fix I wonder if we shouldn't avoid caching dylib/bin crates for now.

Not currently, unfortunately. Let's stick w/ just caching rlibs for now which is conservative and most of the benefit I think anyway.

I'm not clear on what this means--do you mean that we don't hash the target when it's not passed on the commandline?

Oh sorry! So for more esoteric use cases you can actually pass --target foo where foo.json is in the cwd. The foo.json file then says all the various options for your "custom target" (aka these are called "custom target specs").

Note that this is a pretty niche use case though. We can probably protect against this by running rustc --print target-list and then checking if the --target value is not in there. If it's not then it's a custom target and we probably shouldn't cache for now. (again though, seems like a prime candidate for a TODO somewhere as opposed to blocking this on fixing)

luser · 2017-03-23T20:43:27Z

OK, I think I addressed all the review comments.

Pull the "run the preprocessor and generate the hash key" chunk of `get_cached_or_compile` out into a separate method, in expectation of making things more generic in a followup commit to support the Rust compiler, which won't be running a preprocessor. There are a few major changes here: * put the refactored bits from `get_cached_or_compile` into a `hash_key_from_c_preprocessor_output` * added a `generate_hash_key` method to `CompilerKind`, which just calls the previously mentioned function * removed `Compiler::compile`, inlined it into `get_cached_or_compile` since that was the only call site anyway, and it made some lifetime issues easier * changed `get_cached_or_compile` to take ownership of `self`, and changed a few other functions to just take the compiler path as a `&str` instead of taking a `&Compiler`.

The goal here was to make the state that persists between running the preprocessor and running the compiler private, since the Rust compiler does not have a preprocessor, but it will likely have other state it would like to persist. * Add a `Compiler` trait to make the interface to compilation generic. * Add a `Compilation` trait that can be returned from a method on `Compiler` to hold the preprocessor output that is reused for compilation while still allowing the calling code to box `Compiler` and `Compilation` as trait objects. * Add a `CCompiler` struct that impls `Compiler`, but is generic over a second `CCompilerImpl` trait for specific C compilers, since most of the logic of running the preprocessor to generate the hash key is shared. Move all of `hash_key_from_c_preprocessor_output` into the `Compiler` impl on `CCompiler`. * Add {GCC,Clang,MSVC} structs that impl `CCompilerImpl` so they can be used with `CCompiler`. * Rework `CompilerKind` to be a simple utility enum and make `CompilerInfo` actually hold a `Box<Compiler>` and call methods on it directly.

The goal here was to make more of the state that's currently persisted from `parse_arguments` -> `generate_hash_key` -> `compile` private so the C compilers and the Rust compiler can store different kinds of state. * Split the `Compiler` trait further into a `CompilerHasher` trait, which now gets returned in a Box from `Compiler::parse_arguments` as a field of `CompilerArguments`. * Move the existing `ParsedArguments` struct into compiler/c to make it specific to C compilers.

… $sysroot/lib instead of compiler executable. This changeset removes `CompilerInfo` entirely, moving `get_cached_or_compile` into the `Compiler` trait. The server now deals exclusively with objects of `Box<Compiler>`. Also fixes a few other review comments.

luser · 2017-03-24T10:41:21Z

I rebased on top of master and built it locally on Linux/Mac/Windows, so everything should be OK, but I figured I'd push the rebased changes here for sanity's sake. If the CI is green I'll merge this and then we can get on with rebasing and merging all your other PRs.

alexcrichton · 2017-03-24T18:15:15Z

src/compiler/rust.rs

+{
+    let mut f = File::open(file)?;
+    let mut deps = String::new();
+    f.read_to_string(&mut deps)?;


FWIW it looks like this happens on the main event loop thread

Good catch, thanks!

alexcrichton · 2017-03-24T18:22:58Z

I've pushed a commit that parses dep-info files on a cpu pool, but otherwise looks great! r=me w/ green CI

Set up a smoke test pass that check if Substrate/Polkadot build using dist sccache

luser self-assigned this Mar 7, 2017

luser requested a review from alexcrichton March 7, 2017 19:10

alexcrichton reviewed Mar 8, 2017

View reviewed changes

luser added 10 commits March 24, 2017 06:29

Rename Compiler to CompilerInfo

83ea89a

Change the parameters of get_cached_or_compile from borrowed to owned

d219283

Move the current cache::hash_key into compiler::c

d18bcee

Add support for caching Rust compilation.

5a7602c

Add some more timings into the logs

4fe4b6b

document rust caveats

a81ee9d

luser force-pushed the rust-support branch from 9a18370 to a81ee9d Compare March 24, 2017 10:38

alexcrichton reviewed Mar 24, 2017

View reviewed changes

Parse dep-info files on the cpu pool

9c189ca

luser merged commit 907ed3b into mozilla:master Mar 24, 2017

crystalin pushed a commit to PureStake/sccache that referenced this pull request Jul 2, 2021

Merge pull request mozilla#77 from paritytech/igor-smoke-test-experiment

f42ca5f

Set up a smoke test pass that check if Substrate/Polkadot build using dist sccache

Add support for caching Rust compilations. Fixes #42 #77

Add support for caching Rust compilations. Fixes #42 #77

Uh oh!

Conversation

luser commented Mar 7, 2017

Uh oh!

luser commented Mar 8, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Mar 8, 2017

Uh oh!

luser commented Mar 8, 2017

Uh oh!

alexcrichton commented Mar 8, 2017

Uh oh!

luser commented Mar 23, 2017

Uh oh!

luser commented Mar 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Mar 24, 2017

Uh oh!

Uh oh!