Skip to content

Commit 5168ff5

Browse files
koutekaffarell
authored andcommitted
log: add an interest cache for logs emitted through the log crate (tokio-rs#1636)
## Motivation We use `tracing` as our logger in [`substrate`](https://github.com/paritytech/substrate). We've noticed that as soon as *any* `trace` log is enabled (even one which doesn't exists) the whole logging machinery starts to take a lot of time, even if nothing at all is actually printed! In one of our quick-and-dirty reproduction benchmarks (JIT-ing a WASM program) we saw the total real runtime rise from around ~1.3s to ~7s just by adding a `trace` log filter which doesn't match anything. (Depending on the hardware and on how many threads are simultaneously logging this figure varies pretty widely, but it's always a very significant drop.) After looking into this problem I've found that the culprit of the slowdown were `trace!` and `debug!` logs sprinkled quite liberally in some of the more hot codepaths. When there are no `trace`-level filters defined on the logger it can basically reject those inert `trace!` and `debug!` logs purely based on the current maximum logging level (which is cheap!), but as soon as you define *any* trace filter the current maximum logging changes, and then every `trace!` and `debug!` log has to go through the whole filtering machinery before it can be rejected. While this is cheap if you only do it once, it starts to become very expensive when you do it a lot, especially when you're running multiple threads and enable log reloading. (This is related to tokio-rs#1632.) ## Solution I've added an opt-in per-thread LRU cache which tries to cache whenever the logger is actually interested in a given `target` + `level` pair for every log emitted through the `log` crate. I've also added a benchmark very roughly replicating the situation from our code; here's the performance *without* the cache: (`cargo bench`) ``` [838.67 ns 846.51 ns 854.04 ns] ``` And here's the performance *with* the cache: (`cargo bench --features interest-cache`) ``` [25.322 ns 25.556 ns 25.820 ns] ``` As you can see the per-call cost was cut down to less than ~3%.
1 parent 93654b0 commit 5168ff5

File tree

5 files changed

+695
-1
lines changed

5 files changed

+695
-1
lines changed

tracing-log/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,7 @@ maintenance = { status = "actively-maintained" }
3737
[package.metadata.docs.rs]
3838
all-features = true
3939
rustdoc-args = ["--cfg", "docsrs"]
40+
41+
[[bench]]
42+
name = "logging"
43+
harness = false

tracing-log/benches/logging.rs

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
use criterion::{criterion_group, criterion_main, Criterion};
2+
use log::trace;
3+
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
4+
use std::sync::Arc;
5+
use std::time::{Duration, Instant};
6+
use tracing_subscriber::{EnvFilter, FmtSubscriber};
7+
8+
// This creates a bunch of threads and makes sure they start executing
9+
// a given callback almost exactly at the same time.
10+
fn run_on_many_threads<F, R>(thread_count: usize, callback: F) -> Vec<R>
11+
where
12+
F: Fn() -> R + 'static + Send + Clone,
13+
R: Send + 'static,
14+
{
15+
let started_count = Arc::new(AtomicUsize::new(0));
16+
let barrier = Arc::new(AtomicBool::new(false));
17+
#[allow(clippy::needless_collect)]
18+
let threads: Vec<_> = (0..thread_count)
19+
.map(|_| {
20+
let started_count = started_count.clone();
21+
let barrier = barrier.clone();
22+
let callback = callback.clone();
23+
24+
std::thread::spawn(move || {
25+
started_count.fetch_add(1, Ordering::SeqCst);
26+
while !barrier.load(Ordering::SeqCst) {
27+
std::thread::yield_now();
28+
}
29+
30+
callback()
31+
})
32+
})
33+
.collect();
34+
35+
while started_count.load(Ordering::SeqCst) != thread_count {
36+
std::thread::yield_now();
37+
}
38+
barrier.store(true, Ordering::SeqCst);
39+
40+
threads
41+
.into_iter()
42+
.map(|handle| handle.join())
43+
.collect::<Result<Vec<R>, _>>()
44+
.unwrap()
45+
}
46+
47+
fn bench_logger(c: &mut Criterion) {
48+
let env_filter = EnvFilter::default()
49+
.add_directive("info".parse().unwrap())
50+
.add_directive("ws=off".parse().unwrap())
51+
.add_directive("yamux=off".parse().unwrap())
52+
.add_directive("regalloc=off".parse().unwrap())
53+
.add_directive("cranelift_codegen=off".parse().unwrap())
54+
.add_directive("cranelift_wasm=warn".parse().unwrap())
55+
.add_directive("hyper=warn".parse().unwrap())
56+
.add_directive("dummy=trace".parse().unwrap());
57+
58+
let builder = tracing_log::LogTracer::builder().with_max_level(log::LevelFilter::Trace);
59+
60+
#[cfg(feature = "interest-cache")]
61+
let builder = builder.with_interest_cache(tracing_log::InterestCacheConfig::default());
62+
63+
builder.init().unwrap();
64+
65+
let builder = FmtSubscriber::builder()
66+
.with_env_filter(env_filter)
67+
.with_filter_reloading();
68+
69+
let subscriber = builder.finish();
70+
tracing::subscriber::set_global_default(subscriber).unwrap();
71+
72+
const THREAD_COUNT: usize = 8;
73+
74+
c.bench_function("log_from_multiple_threads", |b| {
75+
b.iter_custom(|count| {
76+
let durations = run_on_many_threads(THREAD_COUNT, move || {
77+
let start = Instant::now();
78+
for _ in 0..count {
79+
trace!("A dummy log!");
80+
}
81+
start.elapsed()
82+
});
83+
84+
let total_time: Duration = durations.into_iter().sum();
85+
Duration::from_nanos((total_time.as_nanos() / THREAD_COUNT as u128) as u64)
86+
})
87+
});
88+
}
89+
90+
criterion_group!(benches, bench_logger);
91+
criterion_main!(benches);

0 commit comments

Comments
 (0)