Skip to content

Occasional KVM_EXIT_SHUTDOWN and bad syscall (14) during VM shutdown #1456

Closed
@sipsma

Description

@sipsma

Firecracker-containerd recently updated a test case that previously ran 5 VMs to now run 100 and started seeing the following errors occur in ~0-3 VMs per test case (issue).

As part of terminating VMs, we tell systemd in the guest to reboot the system. Most of the time this works fine, we see reboot: machine restart in the systemd logs and the VMM exits with 0 as expected right after.

Occasionally though, the guest reaches that reboot: machine restart line and then we see a few logs lines from Firecracker like:

2019-12-06T20:29:30.007995741 [anonymous-instance:INFO] Received KVM_EXIT_SHUTDOWN signal
2019-12-06T20:29:30.008087608 [anonymous-instance:ERROR] Shutting down VM after intercepting a bad syscall (14).

Followed by the VMM exiting with code 148. The KVM_EXIT_SHUTDOWN case and bad syscall (which appears to be rt_sigprocmask) seem to only ever occur together (never just one or the other) and only right after the reboot call should be occurring.

  • EDIT: I've now very rarely seen KVM_EXIT_SHUTDOWN occur without an accompanying rt_sigprocmask call that results in the seccomp violation. When KVM_EXIT_SHUTDOWN occurs by itself, the VMM exits with 0 with no apparent issues.

Additionally, I don't see this for every VM that exits like that, but am occasionally getting a line like this in dmesg after a run of the test case where failures occur:

[10548769.845897] traps: fc_vcpu0[52462] trap invalid opcode ip:527d36 sp:7f7ee67f0590 error:0 in firecracker[400000+1ab000]

No idea if it's related or, if it is, why it wouldn't be written every time a VM exits with KVM_EXIT_SHUTDOWN+the bad syscall, but figured it was worth mentioning.

This same behavior occurs in both 0.19 (w/ rust1.35+musl) and a 0.20 binary I built from here (w/ rust1.38+musl).

Let us know what, if anything, would be helpful for further reproducing or debugging as it seems to be a pretty rare issue. We can run our 100 VM test case with debugging binaries and/or see if it's possible to make docker container reproducer for you to use or something like that.

Metadata

Metadata

Assignees

Labels

Type: BugIndicates an unexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions