Skip to content

Commit d1b4c68

Browse files
Florian Westphaldavem330
authored andcommitted
netlink: remove mmapped netlink support
mmapped netlink has a number of unresolved issues: - TX zerocopy support had to be disabled more than a year ago via commit 4682a03 ("netlink: Always copy on mmap TX.") because the content of the mmapped area can change after netlink attribute validation but before message processing. - RX support was implemented mainly to speed up nfqueue dumping packet payload to userspace. However, since commit ae08ce0 ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy with the socket-based interface too (via the skb_zerocopy helper). The other problem is that skbs attached to mmaped netlink socket behave different from normal skbs: - they don't have a shinfo area, so all functions that use skb_shinfo() (e.g. skb_clone) cannot be used. - reserving headroom prevents userspace from seeing the content as it expects message to start at skb->head. See for instance commit aa3a022 ("netlink: not trim skb for mmaped socket when dump"). - skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we crash because it needs the sk to check if a tx ring is attached. Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf3 ("netfilter: nfnetlink: use original skbuff when acking batches"). mmaped netlink also didn't play nicely with the skb_zerocopy helper used by nfqueue and openvswitch. Daniel Borkmann fixed this via commit 6bb0fef ("netlink, mmap: fix edge-case leakages in nf queue zero-copy")' but at the cost of also needing to provide remaining length to the allocation function. nfqueue also has problems when used with mmaped rx netlink: - mmaped netlink doesn't allow use of nfqueue batch verdict messages. Problem is that in the mmap case, the allocation time also determines the ordering in which the frame will be seen by userspace (A allocating before B means that A is located in earlier ring slot, but this also means that B might get a lower sequence number then A since seqno is decided later. To fix this we would need to extend the spinlocked region to also cover the allocation and message setup which isn't desirable. - nfqueue can now be configured to queue large (GSO) skbs to userspace. Queing GSO packets is faster than having to force a software segmentation in the kernel, so this is a desirable option. However, with a mmap based ring one has to use 64kb per ring slot element, else mmap has to fall back to the socket path (NL_MMAP_STATUS_COPY) for all large packets. To use the mmap interface, userspace not only has to probe for mmap netlink support, it also has to implement a recv/socket receive path in order to handle messages that exceed the size of an rx ring element. Cc: Daniel Borkmann <[email protected]> Cc: Ken-ichirou MATSUZAWA <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Cc: Patrick McHardy <[email protected]> Cc: Thomas Graf <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
1 parent 7e6e18f commit d1b4c68

File tree

7 files changed

+15
-1140
lines changed

7 files changed

+15
-1140
lines changed

Documentation/networking/netlink_mmap.txt

Lines changed: 0 additions & 332 deletions
This file was deleted.

include/uapi/linux/netlink.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,10 @@ struct nlmsgerr {
107107
#define NETLINK_PKTINFO 3
108108
#define NETLINK_BROADCAST_ERROR 4
109109
#define NETLINK_NO_ENOBUFS 5
110+
#ifndef __KERNEL__
110111
#define NETLINK_RX_RING 6
111112
#define NETLINK_TX_RING 7
113+
#endif
112114
#define NETLINK_LISTEN_ALL_NSID 8
113115
#define NETLINK_LIST_MEMBERSHIPS 9
114116
#define NETLINK_CAP_ACK 10
@@ -134,6 +136,7 @@ struct nl_mmap_hdr {
134136
__u32 nm_gid;
135137
};
136138

139+
#ifndef __KERNEL__
137140
enum nl_mmap_status {
138141
NL_MMAP_STATUS_UNUSED,
139142
NL_MMAP_STATUS_RESERVED,
@@ -145,6 +148,7 @@ enum nl_mmap_status {
145148
#define NL_MMAP_MSG_ALIGNMENT NLMSG_ALIGNTO
146149
#define NL_MMAP_MSG_ALIGN(sz) __ALIGN_KERNEL(sz, NL_MMAP_MSG_ALIGNMENT)
147150
#define NL_MMAP_HDRLEN NL_MMAP_MSG_ALIGN(sizeof(struct nl_mmap_hdr))
151+
#endif
148152

149153
#define NET_MAJOR 36 /* Major 36 is reserved for networking */
150154

include/uapi/linux/netlink_diag.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ enum {
4848

4949
#define NDIAG_SHOW_MEMINFO 0x00000001 /* show memory info of a socket */
5050
#define NDIAG_SHOW_GROUPS 0x00000002 /* show groups of a netlink socket */
51+
#ifndef __KERNEL__
5152
#define NDIAG_SHOW_RING_CFG 0x00000004 /* show ring configuration */
53+
#endif
5254

5355
#endif

net/netlink/Kconfig

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,6 @@
22
# Netlink Sockets
33
#
44

5-
config NETLINK_MMAP
6-
bool "NETLINK: mmaped IO"
7-
---help---
8-
This option enables support for memory mapped netlink IO. This
9-
reduces overhead by avoiding copying data between kernel- and
10-
userspace.
11-
12-
If unsure, say N.
13-
145
config NETLINK_DIAG
156
tristate "NETLINK: socket monitoring interface"
167
default n

0 commit comments

Comments
 (0)