-
Notifications
You must be signed in to change notification settings - Fork 21
Apps with huge amount of files possible lead to K8s node DoS #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for raising this up. I need to look into this more, but I suspect that these labels are part of the BOM (Bill of Materials) that gets generated. This is an auditing mechanism that provides information about the container and can be used for things like security scanners, license scanners, etc... I'll update you when I have some more info. Thanks. |
@dmikusa-pivotal do I understand correctly that this metadata is not required from a functional perspective? i.e. we could happy live without that metadata and do not loose any functionality? (Thinking of workaround to now) |
@dmikusa-pivotal, the runtime security app (https://falco.org/) also marks the image insecure based on the tags |
Still digging into this. It may also be related to how the app is tracking changes & caching layers. That would be functional, but also much easier to fix. I'll share more details shortly. |
Can you expand on this? I'm not familiar with the site you listed. What doesn't it like about the tags? |
I'm seeing similar problems, not necessarily because of a huge number of files, but because of the listing of all available boot options. The 513174 bytes of that message size are for the labels, with the |
We are also having the issue with the docker images built from spring boot apps. This is a major blocker for us, as we are preparing our system to go production with cf4ks. Thus, fixing this issue is a requirement for us to be able to go live. |
UPDATE
|
UPDATE I have a PR that will be merged shortly to address the issues raised up here. It implements #1 from my previous note as described. It has a different implementation for item #2 though. Rather than add flags to turn on/off these labels, we have just removed the label. The rationale being that if 95% of people turn them off, you can't build any sort of useful tooling around these labels. Thus we just removed them. The exception to this is for Spring DataFlow applications. I believe that the DataFlow UI is using these labels and so we can't remove them in this case without breaking functionality elsewhere. |
OK v4.3.0 has this: https://github.com/paketo-buildpacks/spring-boot/releases/tag/v4.3.0. That version should get picked up by the multi buildpacks and builders shortly. |
Spring Cloud Data Flow Server uses I tried to override the label by adding The only workaround for overriding (since labels can't be removed) that I found is building a new image based on the image created by buildpacks: @dmikusa, сould you reconsider adding flags to exclude labels? |
@imitbn Can you open a new issue for this? I think we can discuss adding an opt-out. Technically it wouldn't be hard. I think the reason we left this in is that there is software outside of buildpacks that is reading the label. If we just removed it that would be a breaking change. Anyway, we can discuss the impact of adding an opt-out on a new issue. Thanks |
@dmikusa Is it supposed to be a part of just spring-boot buildpack or more general (like tiny-builder/base-builder/full-builder)? |
It's specific to this buildpack, so you can open the issue here. |
What happened?
We are packaging spring boot based apps, which also serve static UI content files. Due to the nature of UI dev we now have some apps that have a huge amount of static files that are served via the spring-boot app. Not sure if this is an anti-pattern or not but this situation was the root cause for the outage of some of our K8s nodes.
The reason for this was quite obvious once we have investigated the issue. The buildpack puts the a file listing of the app code as metadata to the resulting image label
io.buildpacks.lifecycle.metadata
. As a result of this, the image and later then the K8s pod metadata get quite huge. In our case the image metadata grows to around 3.2MB in size. When we now run multiple such pods on the same K8s node the overall metadata of all pods can get quite huge. And exactly this is the reason for forcing the K8s node to go down.So, this is my current understanding what leads to this situation:
The K8s kubelet component responsible for retrieving the current state of all containers (part of PLEG) uses gRPC to communicate with the container runtime. This gRPC has a maximum message size of 16MB. If now the metadata of all containers on a node grow beyond this 16MB limit, kubelet can't query the state of its containers any more, finally leading to an unusable K8s node. The error message that we see in this situation is:
See here the corresponding K8s issue: kubernetes/kubernetes#63858
A restart of kubelet or even a reboot of the node does not fix this situation, as the containers with the huge overall metadata are still there. The only way to get out of this situation is to manually remove some of the containers started from such a packeto buildpack created image.
From checking the buildpack sources I understand that on
spring-boot/boot/web_application_type.go
Line 38 in d085524
I also see quite a huge amount of on the data on other lables, e.g.
io.paketo.stack.packages
. In the end this strategy could be a problem when adding data unconditionally.I would consider this also to be a security issue, since once could quickly take down a cf-for-k8s cluster.
How to reproduce
Additional info
Push several Spring-Boot applications with an huge amount of static files via CF-for-K8s. All is packed to a single jar file. So I'm doing a
cf push -p app.jar
.The same behavior can be triggered when scaling such an application horizontally.
All pushed applications are running and K8s cluster is in good conditions ;)
Some K8s nodes go out of service (NotReady). Kubelet logs start looking like this
Build Configuration
pack
,kpack
,tekton
buildpacks plugin, etc.) are youusing? Please include a version.
CF-for-k8s v3.0.0, kpack 0.2.2
log output of the staging process ...
pack inspect-builder <builder>
?cf-default-builder, with stack: bionic-stack (CF-for-k8s default configuration)
buildpack.yml
,nginx.conf
, etc.)?CF-for-k8s default configuration.
Checklist
The text was updated successfully, but these errors were encountered: