Skip to content

[1.6] fix(vmm): only use memfd if no vhost-user-blk devices configured #4502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Changed

- [#4502](https://github.com/firecracker-microvm/firecracker/pull/4502): Only
use memfd to back guest memory if a vhost-user-blk device is configured,
otherwise use anonymous private memory. This is because serving page faults of
shared memory used by memfd is slower and may impact workloads.

## [1.6.0]

### Added
Expand Down
22 changes: 22 additions & 0 deletions docs/api_requests/block-vhost-user.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,28 @@ be coming from the fact that the custom logic is implemented in the same
process that handles Virtio queues, which reduces the number of required
context switches.

## Disadvantages

In order for the backend to be able to process virtio requests, guest memory
needs to be shared by the frontend to the backend. This means, a shared memory
mapping is required to back guest memory. When a vhost-user device is
configured, Firecracker uses `memfd_create` instead of creating an anonymous
private mapping to achieve that. It was observed that page faults to a shared
memory mapping take significantly longer (up to 24% in our testing), because
Linux memory subsystem has to use atomic memory operations to update page
status, which is an expensive operation under specific conditions. We advise
users to profile performance on their workloads when considering to use
vhost-user devices.

## Other considerations

Compared to virtio block device where Firecracker interacts with a drive file on
the host, vhost-user block device is handled by the backend directly. Some
workloads may benefit from caching and readahead that the host pagecache offers
for the backing file. This benefit is not available in vhost-user block case.
Users may need to implement internal caching within the backend if they find it
appropriate.

## Backends

There are a number of open source implementations of a vhost-user backend
Expand Down
31 changes: 27 additions & 4 deletions src/vmm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -239,10 +239,33 @@ pub fn build_microvm_for_boot(
.ok_or(MissingKernelConfig)?;

let track_dirty_pages = vm_resources.track_dirty_pages();
let memfd = create_memfd(vm_resources.vm_config.mem_size_mib)
.map_err(StartMicrovmError::GuestMemory)?;
let guest_memory = GuestMemoryMmap::with_file(memfd.as_file(), track_dirty_pages)
.map_err(StartMicrovmError::GuestMemory)?;

let vhost_user_device_used = vm_resources
.block
.devices
.iter()
.any(|b| matches!(b, BlockDeviceType::VhostUserBlock(_)));

// Page faults are more expensive for shared memory mapping, including memfd.
// For this reason, we only back guest memory with a memfd
// if a vhost-user-blk device is configured in the VM, otherwise we fall back to
// an anonymous private memory.
//
// The vhost-user-blk branch is not currently covered by integration tests in Rust,
// because that would require running a backend process. If in the future we converge to
// a single way of backing guest memory for vhost-user and non-vhost-user cases,
// that would not be worth the effort.
let guest_memory = if vhost_user_device_used {
let memfd = create_memfd(vm_resources.vm_config.mem_size_mib)
.map_err(StartMicrovmError::GuestMemory)?;
GuestMemoryMmap::with_file(memfd.as_file(), track_dirty_pages)
.map_err(StartMicrovmError::GuestMemory)?
} else {
let regions = crate::arch::arch_memory_regions(vm_resources.vm_config.mem_size_mib << 20);
GuestMemoryMmap::from_raw_regions(&regions, track_dirty_pages)
.map_err(StartMicrovmError::GuestMemory)?
};

let entry_addr = load_kernel(boot_config, &guest_memory)?;
let initrd = load_initrd_from_config(boot_config, &guest_memory)?;
// Clone the command-line so that a failed boot doesn't pollute the original.
Expand Down
2 changes: 1 addition & 1 deletion tests/integration_tests/functional/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,7 @@ def test_api_machine_config(test_microvm_with_api):
test_microvm.api.machine_config.patch(mem_size_mib=bad_size)

fail_msg = re.escape(
"Invalid Memory Configuration: Cannot resize memfd file: Custom { kind: InvalidInput, error: TryFromIntError(()) }"
"Invalid Memory Configuration: Cannot create mmap region: Out of memory (os error 12)"
)
with pytest.raises(RuntimeError, match=fail_msg):
test_microvm.start()
Expand Down
22 changes: 12 additions & 10 deletions tests/integration_tests/security/test_jail.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,21 +489,23 @@ def test_positive_file_size_limit(uvm_plain):

def test_negative_file_size_limit(uvm_plain):
"""
Test creating vm fails when memory size exceeds `fsize` limit.
This is caused by the fact that we back guest memory by memfd.
Test creating snapshot file fails when size exceeds `fsize` limit.
"""

vm_mem_size = 128
jail_limit = (vm_mem_size - 1) << 20

test_microvm = uvm_plain
test_microvm.jailer.resource_limits = [f"fsize={jail_limit}"]
# limit to 1MB, to account for logs and metrics
test_microvm.jailer.resource_limits = [f"fsize={2**20}"]
test_microvm.spawn()
test_microvm.basic_config(mem_size_mib=vm_mem_size)
test_microvm.basic_config()
test_microvm.start()

# Attempt to start a vm.
test_microvm.pause()

# Attempt to create a snapshot.
try:
test_microvm.start()
test_microvm.api.snapshot_create.put(
mem_file_path="/vm.mem",
snapshot_path="/vm.vmstate",
)
except (
http_client.RemoteDisconnected,
urllib3.exceptions.ProtocolError,
Expand Down