Age | Commit message (Collapse) | Author |
|
On kernels before v5.15, calling read() on a unix socket after
shutdown(SHUT_RD) or shutdown(SHUT_RDWR) would return the data
previously written or EOF. But now, while read() after
shutdown(SHUT_RD) still behaves the same way, read() after
shutdown(SHUT_RDWR) always fails with -EINVAL.
This behaviour change was apparently inadvertently introduced as part of
a bug fix for a different regression caused by the commit adding sockmap
support to af_unix, commit 94531cfcbe79c359 ("af_unix: Add
unix_stream_proto for sockmap"). Those commits, for unclear reasons,
started setting the socket state to TCP_CLOSE on shutdown(SHUT_RDWR),
while this state change had previously only been done in
unix_release_sock().
Restore the original behaviour. The sockmap tests in
tests/selftests/bpf continue to pass after this patch.
Fixes: d0c6416bd7091647f60 ("unix: Fix an issue in unix_shutdown causing the other end read/write failures")
Link: https://lore.kernel.org/lkml/20211111140000.GA10779@axis.com/
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Yucong noticed we can't poll() sockets in sockmap even
when they are the destination sockets of redirections.
This is because we never poll any psock queues in ->poll(),
except for TCP. With ->sock_is_readable() now we can
overwrite >sock_is_readable(), invoke and implement it for
both UDP and AF_UNIX sockets.
Reported-by: Yucong Sun <sunyucong@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-4-xiyou.wangcong@gmail.com
|
|
Then name of this protocol changed in commit 94531cfcbe79 ("af_unix: Add
unix_stream_proto for sockmap") because that commit added stream support
to the af_unix protocol. Renaming the existing protocol makes a ChromeOS
protocol test[1] fail now that the name has changed in
/proc/net/protocols from "UNIX" to "UNIX-DGRAM".
Let's put the name back to how it was while keeping the stream protocol
as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes
the test and maintains backwards compatibility in proc.
Cc: Jiang Wang <jiang.wang@bytedance.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Link: https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/supported_protocols.go;l=50;drc=e8b1c3f94cb40a054f4aa1ef1aff61e75dc38f18 [1]
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-10-07
We've added 7 non-merge commits during the last 8 day(s) which contain
a total of 8 files changed, 38 insertions(+), 21 deletions(-).
The main changes are:
1) Fix ARM BPF JIT to preserve caller-saved regs for DIV/MOD JIT-internal
helper call, from Johan Almbladh.
2) Fix integer overflow in BPF stack map element size calculation when
used with preallocation, from Tatsuhiko Yasumatsu.
3) Fix an AF_UNIX regression due to added BPF sockmap support related
to shutdown handling, from Jiang Wang.
4) Fix a segfault in libbpf when generating light skeletons from objects
without BTF, from Kumar Kartikeya Dwivedi.
5) Fix a libbpf memory leak in strset to free the actual struct strset
itself, from Andrii Nakryiko.
6) Dual-license bpf_insn.h similarly as we did for libbpf and bpftool,
with ACKs from all contributors, from Luca Boccassi.
====================
Link: https://lore.kernel.org/r/20211007135010.21143-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap") sets
unix domain socket peer state to TCP_CLOSE in unix_shutdown. This could
happen when the local end is shutdown but the other end is not. Then,
the other end will get read or write failures which is not expected.
Fix the issue by setting the local state to shutdown.
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20211004232530.2377085-1-jiang.wang@bytedance.com
|
|
Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.
Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.
Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
unix_create1() returns NULL on error, and the callers assume that it never
fails for reasons other than out of memory. So, the callers always return
-ENOMEM when unix_create1() fails.
However, it also returns NULL when the number of af_unix sockets exceeds
twice the limit controlled by sysctl: fs.file-max. In this case, the
callers should return -ENFILE like alloc_empty_file().
This patch changes unix_create1() to return the correct error value instead
of NULL on error.
Out of curiosity, the assumption has been wrong since 1999 due to this
change introduced in 2.2.4 [0].
diff -u --recursive --new-file v2.2.3/linux/net/unix/af_unix.c linux/net/unix/af_unix.c
--- v2.2.3/linux/net/unix/af_unix.c Tue Jan 19 11:32:53 1999
+++ linux/net/unix/af_unix.c Sun Mar 21 07:22:00 1999
@@ -388,6 +413,9 @@
{
struct sock *sk;
+ if (atomic_read(&unix_nr_socks) >= 2*max_files)
+ return NULL;
+
MOD_INC_USE_COUNT;
sk = sk_alloc(PF_UNIX, GFP_KERNEL, 1);
if (!sk) {
[0]: https://cdn.kernel.org/pub/linux/kernel/v2.2/patch-2.2.4.gz
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot reported another data-race in af_unix [1]
Lets change __skb_insert() to use WRITE_ONCE() when changing
skb head qlen.
Also, change unix_dgram_poll() to use lockless version
of unix_recvq_full()
It is verry possible we can switch all/most unix_recvq_full()
to the lockless version, this will be done in a future kernel version.
[1] HEAD commit: 8596e589b787732c8346f0482919e83cc9362db1
BUG: KCSAN: data-race in skb_queue_tail / unix_dgram_poll
write to 0xffff88814eeb24e0 of 4 bytes by task 25815 on cpu 0:
__skb_insert include/linux/skbuff.h:1938 [inline]
__skb_queue_before include/linux/skbuff.h:2043 [inline]
__skb_queue_tail include/linux/skbuff.h:2076 [inline]
skb_queue_tail+0x80/0xa0 net/core/skbuff.c:3264
unix_dgram_sendmsg+0xff2/0x1600 net/unix/af_unix.c:1850
sock_sendmsg_nosec net/socket.c:703 [inline]
sock_sendmsg net/socket.c:723 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
___sys_sendmsg net/socket.c:2446 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2532
__do_sys_sendmmsg net/socket.c:2561 [inline]
__se_sys_sendmmsg net/socket.c:2558 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2558
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88814eeb24e0 of 4 bytes by task 25834 on cpu 1:
skb_queue_len include/linux/skbuff.h:1869 [inline]
unix_recvq_full net/unix/af_unix.c:194 [inline]
unix_dgram_poll+0x2bc/0x3e0 net/unix/af_unix.c:2777
sock_poll+0x23e/0x260 net/socket.c:1288
vfs_poll include/linux/poll.h:90 [inline]
ep_item_poll fs/eventpoll.c:846 [inline]
ep_send_events fs/eventpoll.c:1683 [inline]
ep_poll fs/eventpoll.c:1798 [inline]
do_epoll_wait+0x6ad/0xf00 fs/eventpoll.c:2226
__do_sys_epoll_wait fs/eventpoll.c:2238 [inline]
__se_sys_epoll_wait fs/eventpoll.c:2233 [inline]
__x64_sys_epoll_wait+0xf6/0x120 fs/eventpoll.c:2233
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000001b -> 0x00000001
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 25834 Comm: syz-executor.1 Tainted: G W 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 86b18aaa2b5b ("skbuff: fix a data race in skb_queue_len()")
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot was able to trigger NULL deref in unix_dgram_connect() [1]
This happens in
if (unix_peer(sk))
sk->sk_state = other->sk_state = TCP_ESTABLISHED; // crash because @other is NULL
Because locks have been dropped, unix_peer() might be non NULL,
while @other is NULL (AF_UNSPEC case)
We need to move code around, so that we no longer access
unix_peer() and sk_state while locks have been released.
[1]
general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 0 PID: 10341 Comm: syz-executor239 Not tainted 5.14.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
FS: 00007f3eb0052700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004787d0 CR3: 0000000029c0a000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__sys_connect_file+0x155/0x1a0 net/socket.c:1890
__sys_connect+0x161/0x190 net/socket.c:1907
__do_sys_connect net/socket.c:1917 [inline]
__se_sys_connect net/socket.c:1914 [inline]
__x64_sys_connect+0x6f/0xb0 net/socket.c:1914
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x446a89
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 a1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3eb0052208 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00000000004cc4d8 RCX: 0000000000446a89
RDX: 000000000000006e RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000004cc4d0 R08: 00007f3eb0052700 R09: 0000000000000000
R10: 00007f3eb0052700 R11: 0000000000000246 R12: 00000000004cc4dc
R13: 00007ffd791e79cf R14: 00007f3eb0052300 R15: 0000000000022000
Modules linked in:
---[ end trace 4eb809357514968c ]---
RIP: 0010:unix_dgram_connect+0x32a/0xc60 net/unix/af_unix.c:1226
Code: 00 00 45 31 ed 49 83 bc 24 f8 05 00 00 00 74 69 e8 eb 5b a6 f9 48 8d 7d 12 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 e0 07 00 00
RSP: 0018:ffffc9000a89fcd8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffffffff87cf4ef5 RDI: 0000000000000012
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88802e1917c3
R10: ffffffff87cf4eba R11: 0000000000000001 R12: ffff88802e191740
R13: 0000000000000000 R14: ffff88802e191d38 R15: ffff88802e1917c0
FS: 00007f3eb0052700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd791fe960 CR3: 0000000029c0a000 CR4: 00000000001506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cong.wang@bytedance.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
bpf-next 2021-08-31
We've added 116 non-merge commits during the last 17 day(s) which contain
a total of 126 files changed, 6813 insertions(+), 4027 deletions(-).
The main changes are:
1) Add opaque bpf_cookie to perf link which the program can read out again,
to be used in libbpf-based USDT library, from Andrii Nakryiko.
2) Add bpf_task_pt_regs() helper to access userspace pt_regs, from Daniel Xu.
3) Add support for UNIX stream type sockets for BPF sockmap, from Jiang Wang.
4) Allow BPF TCP congestion control progs to call bpf_setsockopt() e.g. to switch
to another congestion control algorithm during init, from Martin KaFai Lau.
5) Extend BPF iterator support for UNIX domain sockets, from Kuniyuki Iwashima.
6) Allow bpf_{set,get}sockopt() calls from setsockopt progs, from Prankur Gupta.
7) Add bpf_get_netns_cookie() helper for BPF_PROG_TYPE_{SOCK_OPS,CGROUP_SOCKOPT}
progs, from Xu Liu and Stanislav Fomichev.
8) Support for __weak typed ksyms in libbpf, from Hao Luo.
9) Shrink struct cgroup_bpf by 504 bytes through refactoring, from Dave Marchevsky.
10) Fix a smatch complaint in verifier's narrow load handling, from Andrey Ignatov.
11) Fix BPF interpreter's tail call count limit, from Daniel Borkmann.
12) Big batch of improvements to BPF selftests, from Magnus Karlsson, Li Zhijian,
Yucong Sun, Yonghong Song, Ilya Leoshkevich, Jussi Maki, Ilya Leoshkevich, others.
13) Another big batch to revamp XDP samples in order to give them consistent look
and feel, from Kumar Kartikeya Dwivedi.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (116 commits)
MAINTAINERS: Remove self from powerpc BPF JIT
selftests/bpf: Fix potential unreleased lock
samples: bpf: Fix uninitialized variable in xdp_redirect_cpu
selftests/bpf: Reduce more flakyness in sockmap_listen
bpf: Fix bpf-next builds without CONFIG_BPF_EVENTS
bpf: selftests: Add dctcp fallback test
bpf: selftests: Add connect_to_fd_opts to network_helpers
bpf: selftests: Add sk_state to bpf_tcp_helpers.h
bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
selftests: xsk: Preface options with opt
selftests: xsk: Make enums lower case
selftests: xsk: Generate packets from specification
selftests: xsk: Generate packet directly in umem
selftests: xsk: Simplify cleanup of ifobjects
selftests: xsk: Decrease sending speed
selftests: xsk: Validate tx stats on tx thread
selftests: xsk: Simplify packet validation in xsk tests
selftests: xsk: Rename worker_* functions that are not thread entry points
selftests: xsk: Disassociate umem size with packets sent
selftests: xsk: Remove end-of-test packet
...
====================
Link: https://lore.kernel.org/r/20210830225618.11634-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the
unhash function will call prot->unhash(), which is NULL for SEQPACKET.
And kernel will panic. On ARM32, it will show following messages: (it
likely affects x86 too).
Fix the bug by checking the prot->unhash is NULL or not first.
Kernel log:
<--- cut here ---
Unable to handle kernel NULL pointer dereference at virtual address
00000000
pgd = 2fba1ffb
*pgd=00000000
Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2
Modules linked in:
CPU: 1 PID: 1999 Comm: falkon Tainted: G W
5.14.0-rc5-01175-g94531cfcbe79-dirty #9240
Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
PC is at 0x0
LR is at unix_shutdown+0x81/0x1a8
pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013
sp : e45aff70 ip : e463a3c0 fp : beb54f04
r10: 00000125 r9 : e45ae000 r8 : c4a56664
r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400
r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 50c5387d Table: 05aa804a DAC: 00000051
Register r0 information: slab PING start c4a56400 pointer offset 0
Register r1 information: NULL pointer
Register r2 information: slab task_struct start c5a6b180 pointer offset 0
Register r3 information: NULL pointer
Register r4 information: slab PING start c4a56400 pointer offset 0
Register r5 information: non-paged memory
Register r6 information: slab PING start c4a56400 pointer offset 100
Register r7 information: non-paged memory
Register r8 information: slab PING start c4a56400 pointer offset 612
Register r9 information: non-slab/vmalloc memory
Register r10 information: non-paged memory
Register r11 information: non-paged memory
Register r12 information: slab filp start e463a3c0 pointer offset 0
Process falkon (pid: 1999, stack limit = 0x9ec48895)
Stack: (0xe45aff70 to 0xe45b0000)
ff60: e45ae000 c5f26a00 00000000 00000125
ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4
ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc
ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04
ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000
[<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50)
[<c07f7fa3>] (__sys_shutdown) from [<c010024b>]
(__sys_trace_return+0x1/0x16)
Exception stack(0xe45affa8 to 0xe45afff0)
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com
|
|
Previously, sockmap for AF_UNIX protocol only supports
dgram type. This patch add unix stream type support, which
is similar to unix_dgram_proto. To support sockmap, dgram
and stream cannot share the same unix_proto anymore, because
they have different implementations, such as unhash for stream
type (which will remove closed or disconnected sockets from the map),
so rename unix_proto to unix_dgram_proto and add a new
unix_stream_proto.
Also implement stream related sockmap functions.
And add dgram key words to those dgram specific functions.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
|
|
To support sockmap for af_unix stream type, implement
read_sock, which is similar to the read_sock for unix
dgram sockets.
Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com
|
|
edumazet@google.com pointed out that queue_oob
does not check socket state after acquiring
the lock. He also pointed to an incorrect usage
of kfree_skb and an unnecessary setting of skb
length. This patch addresses those issue.
Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch implements the BPF iterator for the UNIX domain socket.
Currently, the batch optimisation introduced for the TCP iterator in the
commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not
used for the UNIX domain socket. It will require replacing the big lock
for the hash table with small locks for each hash list not to block other
processes.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210814015718.42704-2-kuniyu@amazon.co.jp
|
|
syzkaller found that OOB code was holding spinlock
while calling a function in which it could sleep.
Reported-by: syzbot+8760ca6c1ee783ac4abd@syzkaller.appspotmail.com
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Link: https://lore.kernel.org/r/20210811220652.567434-1-Rao.Shoaib@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Daniel Borkmann says:
====================
bpf-next 2021-08-10
We've added 31 non-merge commits during the last 8 day(s) which contain
a total of 28 files changed, 3644 insertions(+), 519 deletions(-).
1) Native XDP support for bonding driver & related BPF selftests, from Jussi Maki.
2) Large batch of new BPF JIT tests for test_bpf.ko that came out as a result from
32-bit MIPS JIT development, from Johan Almbladh.
3) Rewrite of netcnt BPF selftest and merge into test_progs, from Stanislav Fomichev.
4) Fix XDP bpf_prog_test_run infra after net to net-next merge, from Andrii Nakryiko.
5) Follow-up fix in unix_bpf_update_proto() to enforce socket type, from Cong Wang.
6) Fix bpf-iter-tcp4 selftest to print the correct dest IP, from Jose Blanquicet.
7) Various misc BPF XDP sample improvements, from Niklas Söderlund, Matthew Cover,
and Muhammad Falak R Wani.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (31 commits)
bpf, tests: Add tail call test suite
bpf, tests: Add tests for BPF_CMPXCHG
bpf, tests: Add tests for atomic operations
bpf, tests: Add test for 32-bit context pointer argument passing
bpf, tests: Add branch conversion JIT test
bpf, tests: Add word-order tests for load/store of double words
bpf, tests: Add tests for ALU operations implemented with function calls
bpf, tests: Add more ALU64 BPF_MUL tests
bpf, tests: Add more BPF_LSH/RSH/ARSH tests for ALU64
bpf, tests: Add more ALU32 tests for BPF_LSH/RSH/ARSH
bpf, tests: Add more tests of ALU32 and ALU64 bitwise operations
bpf, tests: Fix typos in test case descriptions
bpf, tests: Add BPF_MOV tests for zero and sign extension
bpf, tests: Add BPF_JMP32 test cases
samples, bpf: Add an explict comment to handle nested vlan tagging.
selftests/bpf: Add tests for XDP bonding
selftests/bpf: Fix xdp_tx.c prog section name
net, core: Allow netdev_lower_get_next_private_rcu in bh context
bpf, devmap: Exclude XDP broadcast to master device
net, bonding: Add XDP support to the bonding driver
...
====================
Link: https://lore.kernel.org/r/20210810130038.16927-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds OOB support for AF_UNIX sockets.
The semantics is same as TCP.
The last byte of a message with the OOB flag is
treated as the OOB byte. The byte is separated into
a skb and a pointer to the skb is stored in unix_sock.
The pointer is used to enforce OOB semantics.
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As of now, only AF_UNIX datagram socket supports sockmap. But
unix_proto is shared for all kinds of AF_UNIX sockets, so we
have to check the socket type in unix_bpf_update_proto() to
explicitly reject other types, otherwise they could be added
into sockmap, too.
Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Reported-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210731195038.8084-1-xiyou.wangcong@gmail.com
|
|
Andrii Nakryiko says:
====================
bpf-next 2021-07-30
We've added 64 non-merge commits during the last 15 day(s) which contain
a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
The main changes are:
1) BTF-guided binary data dumping libbpf API, from Alan.
2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
7) BPF map pinning improvements in libbpf, from Martynas.
8) Improved module BTF support in libbpf and bpftool, from Quentin.
9) Bpftool cleanups and documentation improvements, from Quentin.
10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
11) Increased maximum cgroup storage size, from Stanislav.
12) Small fixes and improvements to BPF tests and samples, from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
tools: bpftool: Complete metrics list in "bpftool prog profile" doc
tools: bpftool: Document and add bash completion for -L, -B options
selftests/bpf: Update bpftool's consistency script for checking options
tools: bpftool: Update and synchronise option list in doc and help msg
tools: bpftool: Complete and synchronise attach or map types
selftests/bpf: Check consistency between bpftool source, doc, completion
tools: bpftool: Slightly ease bash completion updates
unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
tools: bpftool: Support dumping split BTF by id
libbpf: Add split BTF support for btf__load_from_kernel_by_id()
tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
tools: Free BTF objects at various locations
libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
libbpf: Rename btf__load() as btf__load_into_kernel()
libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
bpf: Increase supported cgroup storage value size
libbpf: Fix race when pinning maps in parallel
...
====================
Link: https://lore.kernel.org/r/20210730225606.1897330-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Conflicting commits, all resolutions pretty trivial:
drivers/bus/mhi/pci_generic.c
5c2c85315948 ("bus: mhi: pci-generic: configurable network interface MRU")
56f6f4c4eb2a ("bus: mhi: pci_generic: Apply no-op for wake using sideband wake boolean")
drivers/nfc/s3fwrn5/firmware.c
a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label")
46573e3ab08f ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
801e541c79bb ("nfc: s3fwrn5: fix undefined parameter values in dev_err()")
MAINTAINERS
7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")
8a7b46fa7902 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As Eric noticed, __unix_dgram_recvmsg() may acquire u->iolock
too, so we have to release it before calling this function.
Fixes: 9825d866ce0d ("af_unix: Implement unix_dgram_bpf_recvmsg()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
|
|
unix_gc() assumes that candidate sockets can never gain an external
reference (i.e. be installed into an fd) while the unix_gc_lock is
held. Except for MSG_PEEK this is guaranteed by modifying inflight
count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file
count is not), yet it needs to be serialized with garbage collection.
Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible
to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage
collection is run completely before or completely after the barrier.
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We have to implement unix_dgram_bpf_recvmsg() to replace the
original ->recvmsg() to retrieve skmsg from ingress_msg.
AF_UNIX is again special here because the lack of
sk_prot->recvmsg(). I simply add a special case inside
unix_dgram_recvmsg() to call sk->sk_prot->recvmsg() directly.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-8-xiyou.wangcong@gmail.com
|
|
Now we can implement unix_bpf_update_proto() to update
sk_prot, especially prot->close().
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-7-xiyou.wangcong@gmail.com
|
|
Unlike af_inet, unix_proto is very different, it does not even
have a ->close(). We have to add a dummy implementation to
satisfy sockmap. Normally it is just a nop, it is introduced only
for sockmap to replace it.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-6-xiyou.wangcong@gmail.com
|
|
Currently only unix stream socket sets TCP_ESTABLISHED,
datagram socket can set this too when they connect to its
peer socket. At least __ip4_datagram_connect() does the same.
This will be used to determine whether an AF_UNIX datagram
socket can be redirected to in sockmap.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-5-xiyou.wangcong@gmail.com
|
|
Implement ->read_sock() for AF_UNIX datagram socket, it is
pretty much similar to udp_read_sock().
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-4-xiyou.wangcong@gmail.com
|
|
It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast()
instead of netlink_unicast(), this looks more concise.
v2: remove the change in netfilter.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces a function wrapper to call the sk_error_report
callback. That will prepare to add additional handling whenever
sk_error_report is called, for example to trace socket errors.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We only care about exclusive or of those, so pass that directly.
Makes life simpler for callers as well...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can do that more or less safely, since the parent is
held locked all along. Yes, somebody might observe the
object via dcache, only to have it disappear afterwards,
but there's really no good way to prevent that. It won't
race with other bind(2) or attempts to move the sucker
elsewhere, or put something else in its place - locked
parent prevents that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Final preparations for doing unlink on failure past the successful
mknod. We can't hold ->bindlock over ->mknod() or ->unlink(), since
either might do sb_start_write() (e.g. on overlayfs). However, we
can do it while holding filesystem and VFS locks - doing
kern_path_create()
vfs_mknod()
grab ->bindlock
if u->addr had been set
drop ->bindlock
done_path_create
return -EINVAL
else
assign the address to socket
drop ->bindlock
done_path_create
return 0
would be deadlock-free. Here we massage unix_bind_bsd() to that
form. We are still doing equivalent transformations.
Next commit will *not* be an equivalent transformation - it will
add a call of vfs_unlink() before done_path_create() in "alread bound"
case.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
unix_bind_bsd() and unix_bind_abstract() respectively.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We do get some duplication that way, but it's minor compared to
parts that are different. What we get is an ability to change
locking in BSD case without making failure exits very hard to
follow.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
makes it easier to massage; we do pay for that by extra work
(kmalloc+memcpy+kfree) in some error cases, but those are not
on the hot paths anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Duplicated logics in all bind variants (autobind, bind-to-path,
bind-to-abstract) gets taken into a common helper.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh
scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
While unix_may_send(sk, osk) is called while osk is locked, it appears
unix_release_sock() can overwrite unix_peer() after this lock has been
released, making KCSAN unhappy.
Changing unix_release_sock() to access/change unix_peer()
before lock is released should fix this issue.
BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock
write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1:
unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558
unix_release+0x2f/0x50 net/unix/af_unix.c:859
__sock_release net/socket.c:599 [inline]
sock_close+0x6c/0x150 net/socket.c:1258
__fput+0x25b/0x4e0 fs/file_table.c:280
____fput+0x11/0x20 fs/file_table.c:313
task_work_run+0xae/0x130 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:189 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302
do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0:
unix_may_send net/unix/af_unix.c:189 [inline]
unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg net/socket.c:674 [inline]
____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
___sys_sendmsg net/socket.c:2404 [inline]
__sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0xffff888167905400 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove the repeated word "and".
Signed-off-by: gushengxian <gushengxian@yulong.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When binding a non-abstract AF_UNIX socket it will gain a representation
in the filesystem. Enable the socket infrastructure to handle idmapped
mounts by passing down the user namespace of the mount the socket will
be created from. If the initial user namespace is passed nothing changes
so non-idmapped mounts will see identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-18-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
The various vfs_*() helpers are called by filesystems or by the vfs
itself to perform core operations such as create, link, mkdir, mknod, rename,
rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the
inode is accessed through an idmapped mount map it into the
mount's user namespace and pass it down. Afterwards the checks and
operations are identical to non-idmapped mounts. If the initial user
namespace is passed nothing changes so non-idmapped mounts will see
identical behavior as before.
Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
Add two simple helpers to check permissions on a file and path
respectively and convert over some callers. It simplifies quite a few
codepaths and also reduces the churn in later patches quite a bit.
Christoph also correctly points out that this makes codepaths (e.g.
ioctls) way easier to follow that would otherwise have to do more
complex argument passing than necessary.
Link: https://lore.kernel.org/r/20210121131959.646623-4-christian.brauner@ubuntu.com
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
|
|
Commit 109f6e39fa07c48f5801 ("af_unix: Allow SO_PEERCRED
to work across namespaces.") introduced the old_pid variable
in unix_listen, but it's never used.
Remove the declaration and the call to put_pid.
Signed-off-by: Or Cohen <orcohen@paloaltonetworks.com>
Link: https://lore.kernel.org/r/20201011153527.18628-1-orcohen@paloaltonetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now we have a io_uring kernel header, move this definition out of fs.h
and into io_uring.h where it belongs.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
After commit 37ab4fa7844a ("net: unix: allow bind to fail on mutex lock"),
the assignment to err is redundant. So remove it.
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|