Migrate from cgroup profiling to system-wide profiling #627

javierhonduco · 2022-07-28T09:11:06Z

This PR reworks how we attach the CPU profiler to create one global perf event, instead of creating one per cgroup. This has several advantages, including:

Enabling host-wide profiling: Allowing the profiling of the whole machine, without excluding a single process! As a side-effect, we will be able to profile processes that aren't containerised, too (*)
Reduced resource usage. Both in userland, NewCgroupProfilers instances linearly use memory and CPU, as well as kernel space (fewer data structures that the kernel needs to create, fewer events that have to be delivered, etc)
Making the architecture more flexible. This design will allow us to incorporate new profilers more easily

Design

The discovery mechanism (Kubernetes or Systemd) no longer drives the creation cgroup profilers, instead:

A new CPU profiler is created when the agent starts, which is not attached to any cgroup;
When the kubernetes discovery mechanism detects a new container, it adds some metadata, a mapping of PID to labels, that can be collected on-demand;
Once a profiler 'round' finishes, we create the profiles. Instead of creating one per cgroup, we now generate one per PID. We check the mapping of PID to labels and add them to the profile

Changes vs the initial proposal

Cgroup filtering

One of the initial goals of this work was to keep compatibility with the current filtering that Parca Agent has. For example, adding a new flag to enable system-wide collection (--system-wide, for example) and otherwise we would collect profiles for the Kubernetes clusters or Cgroups that are discovered or specified via arguments.

However, this proved to be trickier than I anticipated, due to the fact that the bpf_get_current_cgroup_id() helper used to retrieve the Cgroup id in BPF does not work under cgroups v1, which is widely used in Kubernetes environments. While a migration to cgroups v2 is on the way, we definitely want to support both cgroup versions.

While this approach would have been the most efficient way to faithfully replicate the current behaviour, as we could do the filtering in BPF and avoid counting these stacks, as well as reading it in userspace, I tried fetching the Cgroup ID in userspace and do the filtering there. However, after trying out this idea, I was not able to read a cgroups v1 ID (one of the approaches).

In the end, I think that while the idea of trying to replicate the current behaviour (and in the future enable system-wide profile collection by default) has some validity, I am not sure if it's worth it. Not only this would be something temporary, but having to do the cgroup ID filtering in userspace will still generate more data and make us read the same data that we would have to do with system-wide enabled by default (minus the profile generation itself). Finding the correct paths for the cgroups can be difficult, too, which might bring bugs and will force us to do way more testing.

Profile generation

Initially, we proposed to generate a profile per cgroup to maintain the current behaviour and to allow filtering by cgroup (see point above).

In this PR we generate one profile per PID.

Other significant changes

The web UI has undergone a lot of changes, we only show the profilers now and downloading the pprof profilers has been removed. We can re-add this in the future if needed, but most of the time Parca Agent is tested with Parca with receives the profiles and will do symbolication, making the profiles easier.
Removal of the Systemd / Cgroup discovery mechanism, as it's no longer needed. We are losing the unit names, which we can re-add later on.

Future work

Remove unneeded code
Add back the systemd/cgroup metadata, if needed
Tackle the TODOs

Test plan

Bare metal

Minikube

Web UI

Fixes #260

This also allows us to have multiple profilers in the future Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

Temporarily comments out UI code that should be dealt with later

javierhonduco · 2022-07-28T09:20:25Z

Fix web tests

cmd/parca-agent/main.go

brancz

Woah. Amazing!! 💯

kakkoyun · 2022-07-28T09:29:28Z

Other significant changes

The web UI has undergone a lot of changes, we only show the profilers now and downloading the pprof profilers has been removed. We can re-add this in the future if needed, but most of the time Parca Agent is tested with Parca with receives the profiles and will do symbolication, making the profiles easier.

Removal of the Systemd / Cgroup discovery mechanism, as it's no longer needed. We are losing the unit names, which we can re-add later on.

Future work

Remove unneeded code

Add back the systemd/cgroup metadata, if needed

Tackle the TODOs

Could you please open issues for all these?

Note to the releaser: These should be mentioned in the release notes.

kakkoyun

Looking good. I did the first round, and I'll continue to review this later on.

Especially, I haven't checked pkg/profiler/cpu.go yet.

Awesome work so far 💯

cmd/parca-agent/main.go

pkg/profiler/pool.go

pkg/profiler/noop.go

pkg/profiler/cpu.go

This commit changes the array of groups to a single group, as this is the interface that the kubernetes discovery mechanism uses, and removes the systemd discoverer as it's no longer needed. All cgroups and system units in the box should be profiled automatically. If there's additional metadata, it will be added (see commit above). Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

pkg/profiler/cpu.go

cmd/parca-agent/main.go

kakkoyun · 2022-07-28T13:30:47Z

Just a couple of nits and then we can handle the rest in subsequent PRs

Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

javierhonduco · 2022-07-28T13:34:41Z

Address feedback

javierhonduco mentioned this pull request Jul 28, 2022

[WIP] *: System wide v2 #532

Closed

4 tasks

javierhonduco added 4 commits July 28, 2022 10:11

system-wide/profiler: cgroup profiler => CPU profiler

32a65fa

This also allows us to have multiple profilers in the future Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

system-wide/target: Initialise the CPU profiler in the new way

45c3694

Temporarily comments out UI code that should be dealt with later

system-wide/profiler: Generate a profile per process

0a74b06

system-wide/profiler: Add skeleton for process metadata

c0653af

javierhonduco force-pushed the system-wide-v3 branch from 15949e5 to 2f041d5 Compare July 28, 2022 09:12

javierhonduco requested a review from kakkoyun July 28, 2022 09:13

javierhonduco force-pushed the system-wide-v3 branch from 2f041d5 to 40eb5f9 Compare July 28, 2022 09:20

brancz reviewed Jul 28, 2022

View reviewed changes

cmd/parca-agent/main.go Outdated Show resolved Hide resolved

brancz approved these changes Jul 28, 2022

View reviewed changes

kakkoyun mentioned this pull request Jul 28, 2022

Update parca-agent docs after system-wide profiling changes parca-dev/docs#161

Open

kakkoyun reviewed Jul 28, 2022

View reviewed changes

cmd/parca-agent/main.go Outdated Show resolved Hide resolved

pkg/profiler/pool.go Outdated Show resolved Hide resolved

pkg/profiler/noop.go Show resolved Hide resolved

pkg/profiler/cpu.go Show resolved Hide resolved

javierhonduco force-pushed the system-wide-v3 branch 2 times, most recently from 6de5176 to d298e12 Compare July 28, 2022 13:07

javierhonduco mentioned this pull request Jul 28, 2022

system-wide: Cleanups and further work #629

Closed

12 tasks

kakkoyun reviewed Jul 28, 2022

View reviewed changes

pkg/profiler/cpu.go Outdated Show resolved Hide resolved

cmd/parca-agent/main.go Outdated Show resolved Hide resolved

javierhonduco added 3 commits July 28, 2022 14:34

system-wide/*: Add process metadata

4008a6d

Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

system-wide/profiler: Use groups

05243d6

Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

system-wide/web: Add basic web ui back

2228657

Signed-off-by: Francisco Javier Honduvilla Coto <[email protected]>

javierhonduco force-pushed the system-wide-v3 branch from d298e12 to 2228657 Compare July 28, 2022 13:34

javierhonduco mentioned this pull request Jul 28, 2022

discovery: Rename Systemd to Cgroup #611

Closed

javierhonduco requested a review from kakkoyun July 28, 2022 14:58

kakkoyun approved these changes Jul 28, 2022

View reviewed changes

kakkoyun changed the title ~~System wide v3~~ Migrate from cgroup profiling to system-wide profiling Jul 28, 2022

kakkoyun merged commit 4772d95 into parca-dev:main Jul 28, 2022

javierhonduco deleted the system-wide-v3 branch July 28, 2022 15:12

This was referenced Aug 1, 2022

profiler: Fix race condition in the profile's buffer #641

Merged

Remove profile listener #648

Closed

docs: Update design.md with the new architecture #650

Closed

This was referenced Aug 3, 2022

Refactor after and cleanup system-wide profiling changes #663

Merged

metadata/discovery: Support hybrid cgroup v1/v2 #180

Closed

javierhonduco mentioned this pull request Aug 4, 2022

Errors on default installation on Linode managed Kubernetes (K8s version 1.23) #416

Closed

kakkoyun mentioned this pull request Oct 10, 2022

Add ability to specify targets to profile #690

Closed

5 tasks

hardys mentioned this pull request Nov 22, 2022

Docs refer to removed systemd-units/kubernetes flags parca-dev/docs#201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from cgroup profiling to system-wide profiling #627

Migrate from cgroup profiling to system-wide profiling #627

javierhonduco commented Jul 28, 2022 •

edited

Loading

javierhonduco commented Jul 28, 2022

brancz left a comment

kakkoyun commented Jul 28, 2022

Other significant changes

Future work

kakkoyun left a comment

kakkoyun commented Jul 28, 2022

javierhonduco commented Jul 28, 2022

Migrate from cgroup profiling to system-wide profiling #627

Migrate from cgroup profiling to system-wide profiling #627

Conversation

javierhonduco commented Jul 28, 2022 • edited Loading

Design

Changes vs the initial proposal

Cgroup filtering

Profile generation

Other significant changes

Future work

Test plan

javierhonduco commented Jul 28, 2022

brancz left a comment

Choose a reason for hiding this comment

kakkoyun commented Jul 28, 2022

Other significant changes

Future work

kakkoyun left a comment

Choose a reason for hiding this comment

kakkoyun commented Jul 28, 2022

javierhonduco commented Jul 28, 2022

javierhonduco commented Jul 28, 2022 •

edited

Loading