-
Notifications
You must be signed in to change notification settings - Fork 576
Add support for caching Rust compilations. Fixes #42 #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The Travis builds are broken on 1.13 for some reason, I will take a look at that. |
src/compiler/compiler.rs
Outdated
Box::new(detect.and_then(move |compiler| { | ||
match compiler { | ||
Some(compiler) => { | ||
Box::new(sha1_digest(executable.clone(), &pool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually somewhat interesting in the rustc case. I'd imagine that quite commonly this is actually the rustup version of rustc.exe
rather than the rustc version of rustc.exe
. I wonder if we could automatically detect rustup and go fetch the actual compiler instead?
I believe that would work best by looking for rustup.exe
in the directory next to rustc.exe
(if found). If that's there we can run rustup which rustc
which should print out the full path to the actual compiler.
Also as a side note here, I wonder if we should handle the dynamic libraries that rustc itself links to? I can't really imagine a use case though where the dylibs are changed and the rustc executable wouldn't change, though, so it's perhaps not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good point and I can't believe I didn't realize it! I don't have a non-rustup install handy, but would it be enough to run rustc --print sysroot
and hash $sysroot/bin/rustc
? I'd have to sanity check how that works with the tooltool-downloaded rustc we use in Firefox CI.
We discussed the dylib issue in Hawaii and we kind of hand-waved it away. I definitely don't want to be parsing the output of ldd
or reading the binary headers to figure this out, but maybe it's enough to hash rustc + everything in $sysroot/lib/
? Actually given that rustc is just a stub, maybe we skip hashing rustc entirely, hash $sysroot/lib
and that's good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but would it be enough to run rustc --print sysroot and hash $sysroot/bin/rustc
Oh right yeah that'd be perfect. (and also solve the rustup case)
This may also throw a wrench into sccache's own caching of the compiler as well unfortunately b/c rustup can have a different rustc depending on which directory it's invoked in (and that can change over time). I think sccache assumes the main compiler doesn't change much, right?
Also yeah I think we can handwave away b/c every release has different hashes in the dylib filenames which I believe implies that the contents of rustc
will be different (e.g. the file somehow references those dylibs). That being said hashing $sysroot/lib
would put a nail in the coffin and work perfectly as well.
The only catch there is that dylibs are in $sysroot/bin
on Windows and $sysroot/lib
everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also throw a wrench into sccache's own caching of the compiler as well unfortunately b/c rustup can have a different rustc depending on which directory it's invoked in (and that can change over time). I think sccache assumes the main compiler doesn't change much, right?
Also a good point, we'll have to do something smarter here. sccache caches compiler info by (path, mtime), so rustup will totally break this expectation. I'll have to think about that for a bit, I don't want to have an extra rustc invocation for every single compile if we can avoid it.
src/compiler/rust.rs
Outdated
Err(e) => return Box::new(future::err(e)), | ||
} | ||
} else { | ||
Box::new(future::err("Failed run rustc --dep-info".into())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this contain the rustc errors logs (e.g. output
above)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. We return the preprocessor stderr for C compilation when it fails, we should do something smarter here. The error propagation in this code isn't great in general, I should make sure we're providing enough information that users can attempt to diagnose failures.
src/compiler/rust.rs
Outdated
drop(temp_dir); | ||
hash_all(files, &pool) | ||
} | ||
Err(e) => return Box::new(future::err(e)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could use chain_err
to say "failed to parse dep info at {}"
let outstr = String::from_utf8(output.stdout).chain_err(|| "Error parsing rustc output")?; | ||
Ok(outstr.lines().map(|l| l.to_owned()).collect()) | ||
} else { | ||
bail!("Failed to run `rustc --print file-names`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this may actually make a good helper as part of run_input_output
, having a method to run a command and auto-check output.status.success()
and bundle up all the relevant error info in the case of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to look at the various places where I call this--I do think I wrote this pattern more than once.
src/compiler/rust.rs
Outdated
// so we can *find* them. | ||
"-l" => return CompilerArguments::CannotCache, | ||
v if v.starts_with("--emit") => { | ||
//XXX: do we need to handle --emit specified more than once? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I think this does happen from time to time (I think I type it in at least every once in a while!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not overly concerned with caching all possible commandlines (clearly), but it'd be good to ensure that we're at least not producing incorrect behavior in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's fine to just return "not cacheable" for multiple --emit
calls for now.
|
||
/// Calculate the SHA-1 digest of each file in `files` on background threads | ||
/// in `pool`. | ||
fn hash_all(files: Vec<String>, pool: &CpuPool) -> SFuture<Vec<String>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these files be sorted to ensure that they're always hashed in the same order? I think the dep-info is relatively deterministic (but that's likely not guaranteed) and I wouldn't be surprised if the order of --extern
from Cargo was also slightly nondeterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort the list of source files from the dep-info, but not the list of externs, they're input to the hash in the order they're listed on the commandline.
If cargo produces nondeterministic commandlines we've got a bigger problem, since we're feeding the entire commandline as hash input! Can we verify that assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah just confirmed, Cargo's order of --extern
is nondeterministic.
If you'd prefer to change in Cargo I can also do that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing that in cargo would certainly make my life easier, and we're likely to need to wait for your dep-info speedup for this to all be useful anyway, so we can probably wait for a fix for that.
Is anything else about the commandline nondeterministic? If so, we'll have to fix sccache to not use the commandline as a direct hash input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I just checked, everything should be deterministic except the order of --extern
which is nondeterministic.
src/compiler/rust.rs
Outdated
m.update(h.as_bytes()); | ||
} | ||
// 6. TODO: Environment variables: | ||
// RUSTFLAGS, maybe CARGO_PKG_*? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUSTFLAGS is safe to ignore as it's just used by Cargo, not rustc.
Other env vars though yeah like CARGO_* should probably be hashed. That being said I don't think the env from the child ships over to the server, right? (this is a case where that may actually be quite important to implement!)
In any case I think it's safe to assume that all of the child's CARGO_*
env vars should be hashed here (although I guess arguably all env vars should be hashed due to env!
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning on fixing #48 immediately to remedy this.
Caching all of the environment variables seems like it's going to result in us having very few cache hits, unless cargo cleans the environment before passing it to rustc. Just looking at the output of env
on my Linux machine shows a bunch of vars with PIDs and other junk in them.
I wonder if it'd be possible to teach rustc to tell us about the usage of env!, similar to emitting dep-info? If we could get a list of env! usage then we'd only have to hash those variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it'd be possible to teach rustc to tell us about the usage of env!, similar to emitting dep-info
I was thinking the same thing yeah. I filed rust-lang/rust#40364 to track this.
I also definitely agree that there's tons of junk in env vars typically. We could try to maintain a blacklist of "don't worry about caching this" but otherwise I think blanket hashing CARGO_*
should work fine for now.
Overall looks great to me, nice! I wonder if we should perhaps document somewhere the limitations of rustc caching? Off the top of my head what I can think of is:
|
We should definitely document those! I wrote down a bunch of things in the
This is a good point. Is there a way to ask rustc what linker it would use? If this isn't simple to fix I wonder if we shouldn't avoid caching dylib/bin crates for now.
I'm not clear on what this means--do you mean that we don't hash the target when it's not passed on the commandline? |
Not currently, unfortunately. Let's stick w/ just caching rlibs for now which is conservative and most of the benefit I think anyway.
Oh sorry! So for more esoteric use cases you can actually pass Note that this is a pretty niche use case though. We can probably protect against this by running |
OK, I think I addressed all the review comments. |
Pull the "run the preprocessor and generate the hash key" chunk of `get_cached_or_compile` out into a separate method, in expectation of making things more generic in a followup commit to support the Rust compiler, which won't be running a preprocessor. There are a few major changes here: * put the refactored bits from `get_cached_or_compile` into a `hash_key_from_c_preprocessor_output` * added a `generate_hash_key` method to `CompilerKind`, which just calls the previously mentioned function * removed `Compiler::compile`, inlined it into `get_cached_or_compile` since that was the only call site anyway, and it made some lifetime issues easier * changed `get_cached_or_compile` to take ownership of `self`, and changed a few other functions to just take the compiler path as a `&str` instead of taking a `&Compiler`.
The goal here was to make the state that persists between running the preprocessor and running the compiler private, since the Rust compiler does not have a preprocessor, but it will likely have other state it would like to persist. * Add a `Compiler` trait to make the interface to compilation generic. * Add a `Compilation` trait that can be returned from a method on `Compiler` to hold the preprocessor output that is reused for compilation while still allowing the calling code to box `Compiler` and `Compilation` as trait objects. * Add a `CCompiler` struct that impls `Compiler`, but is generic over a second `CCompilerImpl` trait for specific C compilers, since most of the logic of running the preprocessor to generate the hash key is shared. Move all of `hash_key_from_c_preprocessor_output` into the `Compiler` impl on `CCompiler`. * Add {GCC,Clang,MSVC} structs that impl `CCompilerImpl` so they can be used with `CCompiler`. * Rework `CompilerKind` to be a simple utility enum and make `CompilerInfo` actually hold a `Box<Compiler>` and call methods on it directly.
The goal here was to make more of the state that's currently persisted from `parse_arguments` -> `generate_hash_key` -> `compile` private so the C compilers and the Rust compiler can store different kinds of state. * Split the `Compiler` trait further into a `CompilerHasher` trait, which now gets returned in a Box from `Compiler::parse_arguments` as a field of `CompilerArguments`. * Move the existing `ParsedArguments` struct into compiler/c to make it specific to C compilers.
… $sysroot/lib instead of compiler executable. This changeset removes `CompilerInfo` entirely, moving `get_cached_or_compile` into the `Compiler` trait. The server now deals exclusively with objects of `Box<Compiler>`. Also fixes a few other review comments.
I rebased on top of master and built it locally on Linux/Mac/Windows, so everything should be OK, but I figured I'd push the rebased changes here for sanity's sake. If the CI is green I'll merge this and then we can get on with rebasing and merging all your other PRs. |
{ | ||
let mut f = File::open(file)?; | ||
let mut deps = String::new(); | ||
f.read_to_string(&mut deps)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it looks like this happens on the main event loop thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
I've pushed a commit that parses dep-info files on a cpu pool, but otherwise looks great! r=me w/ green CI |
Set up a smoke test pass that check if Substrate/Polkadot build using dist sccache
This is a pretty large set of changes, but it breaks down into a few parts:
CompilerInfo::get_cached_or_compile
, since it was all tangled with the logic of running the C preprocessor. I split that out into a separate method and moved a few other things around while I was there. This is the "Pre-Rust support refactoring, Part 1" change.Compiler
,CompilerHasher
, andCompilation
. I changedget_cached_or_compile
and everywhere else that was interacting with compilers to only work with boxed trait objects of these types, and made it so thatCompiler::parse_arguments
returns a boxedCompilerHasher
, and thenCompilerHasher::generate_hash_key
returns a boxedCompilation
. This way the C compiler implementation and the Rust compiler implementation can each persist some private state through the process. I refactored the C compilation bits further into an implementation of these traits that's generic over aCCompilerImpl
trait, implementations of which simply call into the existing gcc/clang/msvc functions. This is the "Pre-Rust support refactoring, Part 2" and "Part 3" changes.There are also a few tiny refactoring changesets mixed in there, I tried to split small, separable changes out when possible to make the whole thing easier to follow.