linux.git - Linux Kernel

Age	Commit message (Collapse)	Author
2022-07-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf	Jakub Kicinski
	Florian Westphal says: ==================== netfilter updates for net Three late fixes for netfilter: 1) If nf_queue user requests packet truncation below size of l3 header, we corrupt the skb, then crash. Reject such requests. 2) add cond_resched() calls when doing cycle detection in the nf_tables graph. This avoids softlockup warning with certain rulesets. 3) Reject rulesets that use nftables 'queue' expression in family/chain combinations other than those that are supported. Currently the ruleset will load, but when userspace attempts to reinject you get WARN splat + packet drops. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_queue: only allow supported familes and hooks netfilter: nf_tables: add rescheduling points during loop detection walks netfilter: nf_queue: do not allow packet truncation below transport header offset ==================== Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	Merge tag 'for-net-2022-07-26' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix early wakeup after suspend - Fix double free on error - Fix use-after-free on l2cap_chan_put * tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put Bluetooth: Always set event mask on suspend Bluetooth: mgmt: Fix double free on error path ==================== Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: do not use the standard strparser	Jakub Kicinski
	TLS is a relatively poor fit for strparser. We pause the input every time a message is received, wait for a read which will decrypt the message, start the parser, repeat. strparser is built to delineate the messages, wrap them in individual skbs and let them float off into the stack or a different socket. TLS wants the data pages and nothing else. There's no need for TLS to keep cloning (and occasionally skb_unclone()'ing) the TCP rx queue. This patch uses a pre-allocated skb and attaches the skbs from the TCP rx queue to it as frags. TLS is careful never to modify the input skb without CoW'ing / detaching it first. Since we call TCP rx queue cleanup directly we also get back the benefit of skb deferred free. Overall this results in a 6% gain in my benchmarks. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: device: add input CoW helper	Jakub Kicinski
	Wrap the remaining skb_cow_data() into a helper, so it's easier to replace down the lane. The new version will change the skb so make sure relevant pointers get reloaded after the call. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tcp: allow tls to decrypt directly from the tcp rcv queue	Jakub Kicinski
	Expose TCP rx queue accessor and cleanup, so that TLS can decrypt directly from the TCP queue. The expectation is that the caller can access the skb returned from tcp_recv_skb() and up to inq bytes worth of data (some of which may be in ->next skbs) and then call tcp_read_done() when data has been consumed. The socket lock must be held continuously across those two operations. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: device: keep the zero copy status with offload	Jakub Kicinski
	The non-zero-copy path assumes a full skb with decrypted contents. This means the device offload would have to CoW the data. Try to keep the zero-copy status instead, copy the data to user space. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: don't free the output in case of zero-copy	Jakub Kicinski
	In the future we'll want to reuse the input skb in case of zero-copy so we shouldn't always free darg.skb. Move the freeing of darg.skb into the non-zc cases. All cases will now free ctx->recv_pkt (inside let tls_rx_rec_done()). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: factor SW handling out of tls_rx_one_record()	Jakub Kicinski
	After recent changes the SW side of tls_rx_one_record() can be nicely encapsulated in its own function. Move the pad handling as well. This will be useful for ->zc handling in tls_decrypt_device(). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	tls: rx: wrap recv_pkt accesses in helpers	Jakub Kicinski
	To allow for the logic to change later wrap accesses which interrogate the input skb in helper functions. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	net: devlink: introduce nested devlink entity for line card	Jiri Pirko
	For the purpose of exposing device info and allow flash update which is going to be implemented in follow-up patches, introduce a possibility for a line card to expose relation to nested devlink entity. The nested devlink entity represents the line card. Example: $ devlink lc show pci/0000:01:00.0 lc 1 pci/0000:01:00.0: lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0 supported_types: 16x100G $ devlink dev show auxiliary/mlxsw_core.lc.0 auxiliary/mlxsw_core.lc.0 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	net: devlink: move net check into devlinks_xa_for_each_registered_get()	Jiri Pirko
	Benefit from having devlinks iterator helper devlinks_xa_for_each_registered_get() and move the net pointer check inside. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	net: devlink: make sure that devlink_try_get() works with valid pointer ↵	Jiri Pirko
	during xarray iteration Remove dependency on devlink_mutex during devlinks xarray iteration. The reason is that devlink_register/unregister() functions taking devlink_mutex would deadlock during devlink reload operation of devlink instance which registers/unregisters nested devlink instances. The devlinks xarray consistency is ensured internally by xarray. There is a reference taken when working with devlink using devlink_try_get(). But there is no guarantee that devlink pointer picked during xarray iteration is not freed before devlink_try_get() is called. Make sure that devlink_try_get() works with valid pointer. Achieve it by: 1) Splitting devlink_put() so the completion is sent only after grace period. Completion unblocks the devlink_unregister() routine, which is followed-up by devlink_free() 2) During devlinks xa_array iteration, get devlink pointer from xa_array holding RCU read lock and taking reference using devlink_try_get() before unlock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put	Luiz Augusto von Dentz
	This fixes the following trace which is caused by hci_rx_work starting up after the final channel reference has been put() during sock_close() but before the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: stable@kernel.org Reported-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26	Bluetooth: Always set event mask on suspend	Abhishek Pandit-Subedi
	When suspending, always set the event mask once disconnects are successful. Otherwise, if wakeup is disallowed, the event mask is not set before suspend continues and can result in an early wakeup. Fixes: 182ee45da083 ("Bluetooth: hci_sync: Rework hci_suspend_notifier") Cc: stable@vger.kernel.org Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26	Bluetooth: mgmt: Fix double free on error path	Dan Carpenter
	Don't call mgmt_pending_remove() twice (double free). Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26	wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()	Tetsuo Handa
	lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1], for commit f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") guards clear_bit() using fq.lock even before fq_init() from ieee80211_txq_setup_flows() initializes this spinlock. According to discussion [2], Toke was not happy with expanding usage of fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we can instead use synchronize_rcu() for flushing ieee80211_wake_txqs(). Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1] Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2] Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: f856373e2f31ffd3 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com> Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp [ pick up commit 3598cb6e1862 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next] Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26	netfilter: nft_queue: only allow supported familes and hooks	Florian Westphal
	Trying to use 'queue' statement in ingress (for example) triggers a splat on reinject: WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291 ... because nf_reinject cannot find the ruleset head. The netdev family doesn't support async resume at the moment anyway, so disallow loading such rulesets with a more appropriate error message. v2: add 'validate' callback and also check hook points, v1 did allow ingress use in 'table inet', but that doesn't work either. (Pablo) Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26	netfilter: nf_tables: add rescheduling points during loop detection walks	Florian Westphal
	Add explicit rescheduling points during ruleset walk. Switching to a faster algorithm is possible but this is a much smaller change, suitable for nf tree. Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26	netfilter: nf_queue: do not allow packet truncation below transport header ↵	Florian Westphal
	offset Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26	flow_offload: Introduce flow_match_pppoe	Wojciech Drewek
	Allow to offload PPPoE filters by adding flow_rule_match_pppoe. Drivers can extract PPPoE specific fields from now on. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26	net/sched: flower: Add PPPoE filter	Wojciech Drewek
	Add support for PPPoE specific fields for tc-flower. Those fields can be provided only when protocol was set to ETH_P_PPP_SES. Defines, dump, load and set are being done here. Overwrite basic.n_proto only in case of PPP_IP and PPP_IPV6, otherwise leave it as ETH_P_PPP_SES. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26	flow_dissector: Add PPPoE dissectors	Wojciech Drewek
	Allow to dissect PPPoE specific fields which are: - session ID (16 bits) - ppp protocol (16 bits) - type (16 bits) - this is PPPoE ethertype, for now only ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC in the future The goal is to make the following TC command possible: # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \ flower \ pppoe_sid 12 \ ppp_proto ip \ action drop Note that only PPPoE Session is supported. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26	bridge: Do not send empty IFLA_AF_SPEC attribute	Benjamin Poirier
	After commit b6c02ef54913 ("bridge: Netlink interface fix."), br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a bridge vlan dump is requested but an interface does not have any vlans configured. iproute2 ignores such an empty attribute since commit b262a9becbcb ("bridge: Fix output with empty vlan lists") but older iproute2 versions as well as other utilities have their output changed by the cited kernel commit, resulting in failed test cases. Regardless, emitting an empty attribute is pointless and inefficient. Avoid this change by canceling the attribute if no AF_SPEC data was added. Fixes: b6c02ef54913 ("bridge: Netlink interface fix.") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26	bpf: Fix bpf_xdp_pointer return pointer	Joanne Koong
	For the case where offset + len == size, bpf_xdp_pointer should return a valid pointer to the addr because that access is permitted. We should only return NULL in the case where offset + len exceeds size. Fixes: 3f364222d032 ("net: xdp: introduce bpf_xdp_pointer utility routine") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/bpf/20220722220105.2065466-1-joannelkoong@gmail.com
2022-07-26	bpf: Set flow flag to allow any source IP in bpf_tunnel_key	Paul Chaignon
	Commit 26101f5ab6bd ("bpf: Add source ip in "struct bpf_tunnel_key"") added support for getting and setting the outer source IP of encapsulated packets via the bpf_skb_{get,set}_tunnel_key BPF helper. This change allows BPF programs to set any IP address as the source, including for example the IP address of a container running on the same host. In that last case, however, the encapsulated packets are dropped when looking up the route because the source IP address isn't assigned to any interface on the host. To avoid this, we need to set the FLOWI_FLAG_ANYSRC flag. Fixes: 26101f5ab6bd ("bpf: Add source ip in "struct bpf_tunnel_key"") Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/76873d384e21288abe5767551a0799ac93ec07fb.1658759380.git.paul@isovalent.com
2022-07-25	sctp: fix sleep in atomic context bug in timer handlers	Duoming Zhou
	There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25	net: delete extra space and tab in blank line	William Dean
	delete extra space and tab in blank line, there is no functional change. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Link: https://lore.kernel.org/r/20220723073222.2961602-1-williamsukatube@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25	net: dsa: fix reference counting for LAG FDBs	Vladimir Oltean
	Due to an invalid conflict resolution on my side while working on 2 different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add() does not store the database associated with a dsa_mac_addr structure. So after adding an FDB entry associated with a LAG, dsa_mac_addr_find() fails to find it while deleting it, because &a->db is zeroized memory for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del() returns -ENOENT rather than deleting the entry. Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25	Merge tag 'wireless-next-2022-07-25' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.20 Third set of patches for v5.20. MLO work continues and we have a lot of stack changes due to that, including driver API changes. Not much driver patches except on mt76. Major changes: cfg80211/mac80211 - more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support, works with one link now - align with IEEE Draft P802.11be_D2.0 - hardware timestamps for receive and transmit mt76 - preparation for new chipset support - ACPI SAR support * tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits) wifi: mac80211: fix link data leak wifi: mac80211: mlme: fix disassoc with MLO wifi: mac80211: add macros to loop over active links wifi: mac80211: remove erroneous sband/link validation wifi: mac80211: mlme: transmit assoc frame with address translation wifi: mac80211: verify link addresses are different wifi: mac80211: rx: track link in RX data wifi: mac80211: optionally implement MLO multicast TX wifi: mac80211: expand ieee80211_mgmt_tx() for MLO wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API wifi: mac80211: report link ID to cfg80211 on mgmt RX wifi: cfg80211: report link ID in NL80211_CMD_FRAME wifi: mac80211: add hardware timestamps for RX and TX wifi: cfg80211: add hardware timestamps to frame RX info wifi: cfg80211/nl80211: move rx management data into a struct wifi: cfg80211: add a function for reporting TX status with hardware timestamps wifi: nl80211: add RX and TX timestamp attributes wifi: ieee80211: add helper functions for detecting TM/FTM frames wifi: mac80211_hwsim: handle links for wmediumd/virtio wifi: mac80211: sta_info: fix link_sta insertion ... ==================== Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25	Merge branch 'master' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2022-07-20 1) Don't set DST_NOPOLICY in IPv4, a recent patch made this superfluous. From Eyal Birger. 2) Convert alg_key to flexible array member to avoid an iproute2 compile warning when built with gcc-12. From Stephen Hemminger. 3) xfrm_register_km and xfrm_unregister_km do always return 0 so change the type to void. From Zhengchao Shao. 4) Fix spelling mistake in esp6.c From Zhang Jiaming. 5) Improve the wording of comment above XFRM_OFFLOAD flags. From Petr Vaněk. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.	Kuniyuki Iwashima
	While reading sysctl_fib_notify_on_flag_change, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	tcp: Fix data-races around sysctl_tcp_reflect_tos.	Kuniyuki Iwashima
	While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.	Kuniyuki Iwashima
	While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.	Kuniyuki Iwashima
	While reading sysctl_tcp_comp_sack_slack_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.	Kuniyuki Iwashima
	While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	net: Fix data-races around sysctl_[rw]mem(_offset)?.	Kuniyuki Iwashima
	While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	tcp: Fix data-races around sk_pacing_rate.	Kuniyuki Iwashima
	While reading sysctl_tcp_pacing_(ss\|ca)_ratio, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 43e122b014c9 ("tcp: refine pacing rate determination") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25	net: mld: fix reference count leak in mld_{query \| report}_work()	Taehee Yoo
	mld_{query \| report}_work() processes queued events. If there are too many events in the queue, it re-queue a work. And then, it returns without in6_dev_put(). But if queuing is failed, it should call in6_dev_put(), but it doesn't. So, a reference count leak would occur. THREAD0 THREAD1 mld_report_work() spin_lock_bh() if (!mod_delayed_work()) in6_dev_hold(); spin_unlock_bh() spin_lock_bh() schedule_delayed_work() spin_unlock_bh() Script to reproduce(by Hangbin Liu): ip netns add ns1 ip netns add ns2 ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip -n ns1 link add veth0 type veth peer name veth0 netns ns2 ip -n ns1 link set veth0 up ip -n ns2 link set veth0 up for i in `seq 50`; do for j in `seq 100`; do ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0 ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0 done done modprobe -r veth ip -a netns del splat looks like: unregister_netdevice: waiting for veth0 to become free. Usage count = 2 leaked reference. ipv6_add_dev+0x324/0xec0 addrconf_notify+0x481/0xd10 raw_notifier_call_chain+0xe3/0x120 call_netdevice_notifiers+0x106/0x160 register_netdevice+0x114c/0x16b0 veth_newlink+0x48b/0xa50 [veth] rtnl_newlink+0x11a2/0x1a40 rtnetlink_rcv_msg+0x63f/0xc00 netlink_rcv_skb+0x1df/0x3e0 netlink_unicast+0x5de/0x850 netlink_sendmsg+0x6c9/0xa90 ____sys_sendmsg+0x76a/0x780 __sys_sendmsg+0x27c/0x340 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Tested-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-24	Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send	Jens Axboe
	* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24	net: fix compat pointer in get_compat_msghdr()	Jens Axboe
	A previous change enabled external users to copy the data before calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or __io_compat_recvmsg_copy_hdr() to take that into account. They are both stil passing in the __user pointer rather than the copied version. Ensure we pass in the kernel struct, not the pointer to the user data. Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/ Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24	net: copy from user before calling __get_compat_msghdr	Dylan Yudaken
	this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24	net: copy from user before calling __copy_msghdr	Dylan Yudaken
	this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24	net/tls: Remove the context from the list in tls_device_down	Maxim Mikityanskiy
	tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22	Merge tag 'for-net-next-2022-07-22' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - Add support for IM Networks PID 0x3568 - Add support for BCM4349B1 - Add support for CYW55572 - Add support for MT7922 VID/PID 0489/e0e2 - Add support for Realtek RTL8852C - Initial support for Isochronous Channels/ISO sockets - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk * tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits) Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet Bluetooth: btusb: Add support for ISO packets Bluetooth: ISO: Add broadcast support Bluetooth: Add initial implementation of BIS connections Bluetooth: Add BTPROTO_ISO socket type Bluetooth: Add initial implementation of CIS connections Bluetooth: hci_core: Introduce hci_recv_event_data Bluetooth: Convert delayed discov_off to hci_sync Bluetooth: Remove update_scan hci_request dependancy Bluetooth: Remove dead code from hci_request.c Bluetooth: btrtl: Fix typo in comment Bluetooth: MGMT: Fix holding hci_conn reference while command is queued Bluetooth: mgmt: Fix using hci_conn_abort Bluetooth: Use bt_status to convert from errno Bluetooth: Add bt_status Bluetooth: hci_sync: Split hci_dev_open_sync Bluetooth: hci_sync: Refactor remove Adv Monitor Bluetooth: hci_sync: Refactor add Adv Monitor Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR ... ==================== Link: https://lore.kernel.org/r/20220723002232.964796-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22	Bluetooth: ISO: Add broadcast support	Luiz Augusto von Dentz
	This adds broadcast support for BTPROTO_ISO by extending the sockaddr_iso with a new struct sockaddr_iso_bc where the socket user can set the broadcast address when receiving, the SID and the BIS indexes it wants to synchronize. When using BTPROTO_ISO for broadcast the roles are: Broadcaster -> uses connect with address set to BDADDR_ANY: > tools/isotest -s 00:00:00:00:00:00 Broadcast Receiver -> uses listen with address set to broadcaster: > tools/isotest -d 00:AA:01:00:00:00 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22	Bluetooth: Add initial implementation of BIS connections	Luiz Augusto von Dentz
	This adds initial support for BIS/BIG which includes: == Broadcaster role: Setup a periodic advertising and create a BIG == > tools/isotest -s 00:00:00:00:00:00 isotest[63]: Connected [00:00:00:00:00:00] isotest[63]: QoS BIG 0x00 BIS 0x00 Packing 0x00 Framing 0x00] isotest[63]: Output QoS [Interval 10000 us Latency 10 ms SDU 40 PHY 0x02 RTN 2] isotest[63]: Sending ... isotest[63]: Number of packets: 1 isotest[63]: Socket jitter buffer: 80 buffer < HCI Command: LE Set Perio.. (0x08\|0x003e) plen 7 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Parameters (0x08\|0x003e) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Perio.. (0x08\|0x003f) plen 7 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Data (0x08\|0x003f) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Perio.. (0x08\|0x0040) plen 2 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Periodic Advertising Enable (0x08\|0x0040) ncmd 1 Status: Success (0x00) < HCI Command: LE Create B.. (0x08\|0x0068) plen 31 ... > HCI Event: Command Status (0x0f) plen 4 LE Create Broadcast Isochronous Group (0x08\|0x0068) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 21 LE Broadcast Isochronous Group Complete (0x1b) ... == Broadcast Receiver role: Create a PA Sync and BIG Sync == > tools/isotest -i hci1 -d 00:AA:01:00:00:00 isotest[66]: Waiting for connection 00:AA:01:00:00:00... < HCI Command: LE Periodic Advert.. (0x08\|0x0044) plen 14 ... > HCI Event: Command Status (0x0f) plen 4 LE Periodic Advertising Create Sync (0x08\|0x0044) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Extended Sca.. (0x08\|0x0041) plen 8 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Scan Parameters (0x08\|0x0041) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Extended Sca.. (0x08\|0x0042) plen 6 ... > HCI Event: Command Complete (0x0e) plen 4 LE Set Extended Scan Enable (0x08\|0x0042) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 LE Extended Advertising Report (0x0d) ... > HCI Event: LE Meta Event (0x3e) plen 16 LE Periodic Advertising Sync Established (0x0e) ... < HCI Command: LE Broadcast Isoch.. (0x08\|0x006b) plen 25 ... > HCI Event: Command Status (0x0f) plen 4 LE Broadcast Isochronous Group Create Sync (0x08\|0x006b) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 17 LE Broadcast Isochronous Group Sync Estabilished (0x1d) ... Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22	Bluetooth: Add BTPROTO_ISO socket type	Luiz Augusto von Dentz
	This introduces a new socket type BTPROTO_ISO which can be enabled with use of ISO Socket experiemental UUID, it can used to initiate/accept connections and transfer packets between userspace and kernel similarly to how BTPROTO_SCO works: Central -> uses connect with address set to destination bdaddr: > tools/isotest -s 00:AA:01:00:00:00 Peripheral -> uses listen: > tools/isotest -d Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22	Bluetooth: Add initial implementation of CIS connections	Luiz Augusto von Dentz
	This adds the initial implementation of CIS connections and introduces the ISO packets/links. == Central: Set CIG Parameters, create a CIS and Setup Data Path == > tools/isotest -s <address> < HCI Command: LE Extended Create... (0x08\|0x0043) plen 26 ... > HCI Event: Command Status (0x0f) plen 4 LE Extended Create Connection (0x08\|0x0043) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 31 LE Enhanced Connection Complete (0x0a) ... < HCI Command: LE Create Connected... (0x08\|0x0064) plen 5 ... > HCI Event: Command Status (0x0f) plen 4 LE Create Connected Isochronous Stream (0x08\|0x0064) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 29 LE Connected Isochronous Stream Established (0x19) ... < HCI Command: LE Setup Isochronou.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 < HCI Command: LE Setup Isochronou.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 == Peripheral: Accept CIS and Setup Data Path == > tools/isotest -d HCI Event: LE Meta Event (0x3e) plen 7 LE Connected Isochronous Stream Request (0x1a) ... < HCI Command: LE Accept Co.. (0x08\|0x0066) plen 2 ... > HCI Event: LE Meta Event (0x3e) plen 29 LE Connected Isochronous Stream Established (0x19) ... < HCI Command: LE Setup Is.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 < HCI Command: LE Setup Is.. (0x08\|0x006e) plen 13 ... > HCI Event: Command Complete (0x0e) plen 6 LE Setup Isochronous Data Path (0x08\|0x006e) ncmd 1 Status: Success (0x00) Handle: 257 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	Jakub Kicinski
	Daniel Borkmann says: ==================== bpf-next 2022-07-22 We've added 73 non-merge commits during the last 12 day(s) which contain a total of 88 files changed, 3458 insertions(+), 860 deletions(-). The main changes are: 1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai. 2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel syscalls through kprobe mechanism, from Andrii Nakryiko. 3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel function, from Song Liu & Jiri Olsa. 4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi. 5) Add a ksym BPF iterator to allow for more flexible and efficient interactions with kernel symbols, from Alan Maguire. 6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter. 7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov. 8) libbpf support for writing custom perf event readers, from Jon Doron. 9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar. 10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski. 11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin. 12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev. 13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui. 14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong. 15) Make non-prealloced BPF map allocations low priority to play better with memcg limits, from Yafang Shao. 16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao. 17) Various smaller cleanups and improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits) bpf: Simplify bpf_prog_pack_[size\|mask] bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch) bpf, x64: Allow to use caller address from stack ftrace: Allow IPMODIFY and DIRECT ops on the same function ftrace: Add modify_ftrace_direct_multi_nolock bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF selftests/bpf: Fix test_verifier failed test in unprivileged mode selftests/bpf: Add negative tests for new nf_conntrack kfuncs selftests/bpf: Add tests for new nf_conntrack kfuncs selftests/bpf: Add verifier tests for trusted kfunc args net: netfilter: Add kfuncs to set and change CT status net: netfilter: Add kfuncs to set and change CT timeout net: netfilter: Add kfuncs to allocate and insert CT net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup bpf: Add documentation for kfuncs bpf: Add support for forcing kfunc args to be trusted bpf: Switch to new kfunc flags infrastructure tools/resolve_btfids: Add support for 8-byte BTF sets bpf: Introduce 8-byte BTF set ... ==================== Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22	Revert "tcp: change pingpong threshold to 3"	Wei Wang
	This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c. This to-be-reverted commit was meant to apply a stricter rule for the stack to enter pingpong mode. However, the condition used to check for interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is jiffy based and might be too coarse, which delays the stack entering pingpong mode. We revert this patch so that we no longer use the above condition to determine interactive session, and also reduce pingpong threshold to 1. Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3") Reported-by: LemmyHuang <hlm3280@163.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>