summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-17net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counterVladimir Oltean
This register, used as part of stats->tx_dropped in ocelot_get_stats64(), has a wrong address. At the address currently given, there is actually the c_tx_green_prio_6 counter. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet countersVladimir Oltean
What the driver actually reports as 256-511 is in fact 512-1023, and the TX packets in the 256-511 bucket are not reported. Fix that. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support itVladimir Oltean
ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: fd3645413197 ("net: dsa: change scope of STP state setter") Reported-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220816201445.1809483-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: sja1105: fix buffer overflow in sja1105_setup_devlink_regions()Rustam Subkhankulov
If an error occurs in dsa_devlink_region_create(), then 'priv->regions' array will be accessed by negative index '-1'. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rustam Subkhankulov <subkhankulov@ispras.ru> Fixes: bf425b82059e ("net: dsa: sja1105: expose static config as devlink region") Link: https://lore.kernel.org/r/20220817003845.389644-1-subkhankulov@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/20220817140015.25843-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()David Howells
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags() to obtain the value of sk->sk_user_data, but that function is only usable if the RCU read lock is held, and neither that function nor any of its callers hold it. Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() that checks to see if sk->sk_callback_lock() is held and use that here instead. Alternatively, making __rcu_dereference_sk_user_data_with_flags() use rcu_dereference_checked() might suffice. Without this, the following warning can be occasionally observed: ============================= WARNING: suspicious RCU usage 6.0.0-rc1-build2+ #563 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/29873: #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4 stack backtrace: CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f bpf_sk_reuseport_detach+0x6d/0xa4 reuseport_detach_sock+0x75/0xdd inet_unhash+0xa5/0x1c0 tcp_set_state+0x169/0x20f ? lockdep_sock_is_held+0x3a/0x3a ? __lock_release.isra.0+0x13e/0x220 ? reacquire_held_locks+0x1bb/0x1bb ? hlock_class+0x31/0x96 ? mark_lock+0x9e/0x1af __tcp_close+0x50/0x4b6 tcp_close+0x28/0x70 inet_release+0x8e/0xa7 __sock_release+0x95/0x121 sock_close+0x14/0x17 __fput+0x20f/0x36a task_work_run+0xa3/0xcc exit_to_user_mode_prepare+0x9c/0x14d syscall_exit_to_user_mode+0x18/0x44 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Hawkins Jiawei <yin31149@gmail.com> Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: dsa: microchip: ksz9477: fix fdb_dump last invalid entryArun Ramadoss
In the ksz9477_fdb_dump function it reads the ALU control register and exit from the timeout loop if there is valid entry or search is complete. After exiting the loop, it reads the alu entry and report to the user space irrespective of entry is valid. It works till the valid entry. If the loop exited when search is complete, it reads the alu table. The table returns all ones and it is reported to user space. So bridge fdb show gives ff:ff:ff:ff:ff:ff as last entry for every port. To fix it, after exiting the loop the entry is reported only if it is valid one. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220816105516.18350-1-arun.ramadoss@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17testing: selftests: nft_flowtable.sh: rework test to detect offload failureFlorian Westphal
This test fails on current kernel releases because the flotwable path now calls dst_check from packet path and will then remove the offload. Test script has two purposes: 1. check that file (random content) can be sent to other netns (and vv) 2. check that the flow is offloaded (rather than handled by classic forwarding path). Since dst_check is in place, 2) fails because the nftables ruleset in router namespace 1 intentionally blocks traffic under the assumption that packets are not passed via classic path at all. Rework this: Instead of blocking traffic, create two named counters, one for original and one for reverse direction. The first three test cases are handled by classic forwarding path (path mtu discovery is disabled and packets exceed MTU). But all other tests enable PMTUD, so the originator and responder are expected to lower packet size and flowtable is expected to do the packet forwarding. For those tests, check that the packet counters (which are only incremented for packets that are passed up to classic forward path) are significantly lower than the file size transferred. I've tested that the counter-checks fail as expected when the 'flow add' statement is removed from the ruleset. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-17Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue ==================== Intel Wired LAN Driver Updates 2022-08-16) This series contains updates to i40e driver only. Przemyslaw fixes issue with checksum offload on VXLAN tunnels. Alan disables VSI for Tx timeout when all recovery methods have failed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17tls: rx: react to strparser initialization errorsJakub Kicinski
Even though the normal strparser's init function has a return value we got away with ignoring errors until now, as it only validates the parameters and we were passing correct parameters. tls_strp can fail to init on memory allocation errors, which syzbot duly induced and reported. Reported-by: syzbot+abd45eb849b05194b1b6@syzkaller.appspotmail.com Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-17testing: selftests: nft_flowtable.sh: use random netns namesFlorian Westphal
"ns1" is a too generic name, use a random suffix to avoid errors when such a netns exists. Also allows to run multiple instances of the script in parallel. Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-17netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to yGeert Uytterhoeven
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09 ("netfilter: provide config option to disable ancient procfs parts") in v3.3. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-16net: sched: fix misuse of qcpu->backlog in gnet_stats_add_queue_cpuZhengchao Shao
In the gnet_stats_add_queue_cpu function, the qstats->qlen statistics are incorrectly set to qcpu->backlog. Fixes: 448e163f8b9b ("gen_stats: Add gnet_stats_add_queue()") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220815030848.276746-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16i40e: Fix to stop tx_timeout recovery if GLOBR failsAlan Brady
When a tx_timeout fires, the PF attempts to recover by incrementally resetting. First we try a PFR, then CORER and finally a GLOBR. If the GLOBR fails, then we keep hitting the tx_timeout and incrementing the recovery level and issuing dmesgs, which is both annoying to the user and accomplishes nothing. If the GLOBR fails, then we're pretty much totally hosed, and there's not much else we can do to recover, so this makes it such that we just kill the VSI and stop hitting the tx_timeout in such a case. Fixes: 41c445ff0f48 ("i40e: main driver core") Signed-off-by: Alan Brady <alan.brady@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-16i40e: Fix tunnel checksum offload with fragmented trafficPrzemyslaw Patynowski
Fix checksum offload on VXLAN tunnels. In case, when mpls protocol is not used, set l4 header to transport header of skb. This fixes case, when user tries to offload checksums of VXLAN tunneled traffic. Steps for reproduction (requires link partner with tunnels): ip l s enp130s0f0 up ip a f enp130s0f0 ip a a 10.10.110.2/24 dev enp130s0f0 ip l s enp130s0f0 mtu 1600 ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev \ enp130s0f0 dstport 4789 ip l s vxlan12_sut up ip a a 20.10.110.2/24 dev vxlan12_sut iperf3 -c 20.10.110.1 #should connect Without this patch, TX descriptor was using wrong data, due to l4 header pointing wrong address. NIC would then drop those packets internally, due to incorrect TX descriptor data, which increased GLV_TEPC register. Fixes: b4fb2d33514a ("i40e: Add support for MPLS + TSO") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-15Merge branch '40GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-12 (iavf) This series contains updates to iavf driver only. Przemyslaw frees memory for admin queues in initialization error paths, prevents freeing of vf_res which is causing null pointer dereference, and adjusts calls in error path of reset to avoid iavf_close() which could cause deadlock. Ivan Vecera avoids deadlock that can occur when driver if part of failover. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: Fix deadlock in initialization iavf: Fix reset error handling iavf: Fix NULL pointer dereference in iavf_get_link_ksettings iavf: Fix adminq error handling ==================== Link: https://lore.kernel.org/r/20220812172309.853230-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15net: rtnetlink: fix module reference count leak issue in rtnetlink_rcv_msgZhengchao Shao
When bulk delete command is received in the rtnetlink_rcv_msg function, if bulk delete is not supported, module_put is not called to release the reference counting. As a result, module reference count is leaked. Fixes: a6cec0bcd342 ("net: rtnetlink: add bulk delete support flag") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220815024629.240367-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15net: moxa: pass pdev instead of ndev to DMA functionsSergei Antonov
dma_map_single() calls fail in moxart_mac_setup_desc_ring() and moxart_mac_start_xmit() which leads to an incessant output of this: [ 16.043925] moxart-ethernet 92000000.mac eth0: DMA mapping error [ 16.050957] moxart-ethernet 92000000.mac eth0: DMA mapping error [ 16.058229] moxart-ethernet 92000000.mac eth0: DMA mapping error Passing pdev to DMA is a common approach among net drivers. Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver") Signed-off-by: Sergei Antonov <saproj@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220812171339.2271788-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specifiedPablo Neira Ayuso
Since f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields"), it possible to combine intervals and concatenations. Later on, ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag for userspace to report that the set stores a concatenation. Make sure NFT_SET_CONCAT is set on if field_count is specified for consistency. Otherwise, if NFT_SET_CONCAT is specified with no field_count, bail out with EINVAL. Fixes: ef516e8625dd ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and ↵Pablo Neira Ayuso
NFT_SET_ELEM_INTERVAL_END These flags are mutually exclusive, report EINVAL in this case. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flagsPablo Neira Ayuso
If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise, NFTA_SET_ELEM_KEY_END should not be present. For catch-all element, NFTA_SET_ELEM_KEY_END should not be present. The NFT_SET_ELEM_INTERVAL_END is never used with this set flags combination. Fixes: 7b225d0b5c6d ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15Merge branch 'mlxsw-fixes'David S. Miller
Petr Machata says: ==================== mlxsw: Fixes for PTP support This set fixes several issues in mlxsw PTP code. - Patch #1 fixes compilation warnings. - Patch #2 adjusts the order of operation during cleanup, thereby closing the window after PTP state was already cleaned in the ASIC for the given port, but before the port is removed, when the user could still in theory make changes to the configuration. - Patch #3 protects the PTP configuration with a custom mutex, instead of relying on RTNL, which is not held in all access paths. - Patch #4 forbids enablement of PTP only in RX or only in TX. The driver implicitly assumed this would be the case, but neglected to sanitize the configuration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum_ptp: Forbid PTP enablement only in RX or in TXAmit Cohen
Currently mlxsw driver configures one global PTP configuration for all ports. The reason is that the switch behaves like a transparent clock between CPU port and front-panel ports. When time stamp is enabled in any port, the hardware is configured to update the correction field. The fact that the configuration of CPU port affects all the ports, makes the correction field update to be global for all ports. Otherwise, user will see odd values in the correction field, as the switch will update the correction field in the CPU port, but not in all the front-panel ports. The CPU port is relevant in both RX and TX, so to avoid problematic configuration, forbid PTP enablement only in one direction, i.e., only in RX or TX. Without the change: $ hwstamp_ctl -i swp1 -r 12 -t 0 current settings: tx_type 0 rx_filter 0 new settings: tx_type 0 rx_filter 2 $ echo $? 0 With the change: $ hwstamp_ctl -i swp1 -r 12 -t 0 current settings: tx_type 1 rx_filter 2 SIOCSHWTSTAMP failed: Invalid argument Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum_ptp: Protect PTP configuration with a mutexAmit Cohen
Currently the functions mlxsw_sp2_ptp_{configure, deconfigure}_port() assume that they are called when RTNL is locked and they warn otherwise. The deconfigure function can be called when port is removed, for example as part of device reload, then there is no locked RTNL and the function warns [1]. To avoid such case, do not assume that RTNL protects this code, add a dedicated mutex instead. The mutex protects 'ptp_state->config' which stores the existing global configuration in hardware. Use this mutex also to protect the code which configures the hardware. Then, there will be only one configuration in any time, which will be updated in 'ptp_state' and a race will be avoided. [1]: RTNL: assertion failed at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c (1600) WARNING: CPU: 1 PID: 1583493 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1600 mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300 [mlxsw_spectrum] [...] CPU: 1 PID: 1583493 Comm: devlink Not tainted5.19.0-rc8-custom-127022-gb371dffda095 #789 Hardware name: Mellanox Technologies Ltd.MSN3420/VMOD0005, BIOS 5.11 01/06/2019 RIP: 0010:mlxsw_sp2_ptp_hwtstamp_set+0x2d3/0x300[mlxsw_spectrum] [...] Call Trace: <TASK> mlxsw_sp_port_remove+0x7e/0x190 [mlxsw_spectrum] mlxsw_sp_fini+0xd1/0x270 [mlxsw_spectrum] mlxsw_core_bus_device_unregister+0x55/0x280 [mlxsw_core] mlxsw_devlink_core_bus_device_reload_down+0x1c/0x30[mlxsw_core] devlink_reload+0x1ee/0x230 devlink_nl_cmd_reload+0x4de/0x580 genl_family_rcv_msg_doit+0xdc/0x140 genl_rcv_msg+0xd7/0x1d0 netlink_rcv_skb+0x49/0xf0 genl_rcv+0x1f/0x30 netlink_unicast+0x22f/0x350 netlink_sendmsg+0x208/0x440 __sys_sendto+0xf0/0x140 __x64_sys_sendto+0x1b/0x20 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 08ef8bc825d96 ("mlxsw: spectrum_ptp: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum: Clear PTP configuration after unregistering the netdeviceAmit Cohen
Currently as part of removing port, PTP API is called to clear the existing configuration and set the 'rx_filter' and 'tx_type' to zero. The clearing is done before unregistering the netdevice, which means that there is a window of time in which the user can reconfigure PTP in the port, and this configuration will not be cleared. Reorder the operations, clear PTP configuration after unregistering the netdevice. Fixes: 8748642751ede ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15mlxsw: spectrum_ptp: Fix compilation warningsAmit Cohen
In case that 'CONFIG_PTP_1588_CLOCK' is not enabled in the config file, there are implementations for the functions mlxsw_{sp,sp2}_ptp_txhdr_construct() as part of 'spectrum_ptp.h'. In this case, they should be defined as 'static' as they are not supposed to be used out of this file. Make the functions 'static', otherwise the following warnings are returned: "warning: no previous prototype for 'mlxsw_sp_ptp_txhdr_construct'" "warning: no previous prototype for 'mlxsw_sp2_ptp_txhdr_construct'" In addition, make the functions 'inline' for case that 'spectrum_ptp.h' will be included anywhere else and the functions would probably not be used, so compilation warnings about unused static will be returned. Fixes: 24157bc69f45 ("mlxsw: Send PTP packets as data packets to overcome a limitation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15net_sched: cls_route: disallow handle of 0Jamal Hadi Salim
Follows up on: https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/ handle of 0 implies from/to of universe realm which is not very sensible. Lets see what this patch will do: $sudo tc qdisc add dev $DEV root handle 1:0 prio //lets manufacture a way to insert handle of 0 $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \ route to 0 from 0 classid 1:10 action ok //gets rejected... Error: handle of 0 is not valid. We have an error talking to the kernel, -1 //lets create a legit entry.. sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \ classid 1:10 action ok //what did the kernel insert? $sudo tc filter ls dev $DEV parent 1:0 filter protocol ip pref 100 route chain 0 filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 //Lets try to replace that legit entry with a handle of 0 $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \ handle 0x000a8000 route to 0 from 0 classid 1:10 action drop Error: Replacing with handle of 0 is invalid. We have an error talking to the kernel, -1 And last, lets run Cascardo's POC: $ ./poc 0 0 -22 -22 -22 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15net: fix potential refcount leak in ndisc_router_discovery()Xin Xiong
The issue happens on specific paths in the function. After both the object `rt` and `neigh` are grabbed successfully, when `lifetime` is nonzero but the metric needs change, the function just deletes the route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh` again if above conditions hold. The function simply overwrite `neigh` if succeeds or returns if fails, without decreasing the reference count of previous `neigh`. This may result in memory leaks. Fix it by decrementing the reference count of `neigh` in place. Fixes: 6b2e04bc240f ("net: allow user to set metric on default route learned via Router Advertisement") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/netDavid S. Miller
-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-11 (ice) This series contains updates to ice driver only. Benjamin corrects a misplaced parenthesis for a WARN_ON check. Michal removes WARN_ON from a check as its recoverable and not warranting of a call trace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15neighbour: make proxy_queue.qlen limit per-deviceAlexander Mikhalitsyn
Right now we have a neigh_param PROXY_QLEN which specifies maximum length of neigh_table->proxy_queue. But in fact, this limitation doesn't work well because check condition looks like: tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN) The problem is that p (struct neigh_parms) is a per-device thing, but tbl (struct neigh_table) is a system-wide global thing. It seems reasonable to make proxy_queue limit per-device based. v2: - nothing changed in this patch v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15neigh: fix possible DoS due to net iface start/stop loopDenis V. Lunev
Normal processing of ARP request (usually this is Ethernet broadcast packet) coming to the host is looking like the following: * the packet comes to arp_process() call and is passed through routing procedure * the request is put into the queue using pneigh_enqueue() if corresponding ARP record is not local (common case for container records on the host) * the request is processed by timer (within 80 jiffies by default) and ARP reply is sent from the same arp_process() using NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside pneigh_enqueue()) And here the problem comes. Linux kernel calls pneigh_queue_purge() which destroys the whole queue of ARP requests on ANY network interface start/stop event through __neigh_ifdown(). This is actually not a problem within the original world as network interface start/stop was accessible to the host 'root' only, which could do more destructive things. But the world is changed and there are Linux containers available. Here container 'root' has an access to this API and could be considered as untrusted user in the hosting (container's) world. Thus there is an attack vector to other containers on node when container's root will endlessly start/stop interfaces. We have observed similar situation on a real production node when docker container was doing such activity and thus other containers on the node become not accessible. The patch proposed doing very simple thing. It drops only packets from the same namespace in the pneigh_queue_purge() where network interface state change is detected. This is enough to prevent the problem for the whole node preserving original semantics of the code. v2: - do del_timer_sync() if queue is empty after pneigh_queue_purge() v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15net: qrtr: start MHI channel after endpoit creationMaxim Kochetkov
MHI channel may generates event/interrupt right after enabling. It may leads to 2 race conditions issues. 1) Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check: if (!qdev || mhi_res->transaction_status) return; Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at this moment. In this situation qrtr-ns will be unable to enumerate services in device. --------------------------------------------------------------- 2) Such event may come at the moment after dev_set_drvdata() and before qrtr_endpoint_register(). In this case kernel will panic with accessing wrong pointer at qcom_mhi_qrtr_dl_callback(): rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr, mhi_res->bytes_xferd); Because endpoint is not created yet. -------------------------------------------------------------- So move mhi_prepare_for_transfer_autoqueue after endpoint creation to fix it. Fixes: a2e2cc0dbb11 ("net: qrtr: Start MHI channels during init") Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-13ip6_tunnel: Fix the type of functionsHongbin Wang
Functions ip6_tnl_change, ip6_tnl_update and ip6_tnl0_update do always return 0, change the type of functions to void. Signed-off-by: Hongbin Wang <wh_bin@126.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12net: dsa: mv88e6060: prevent crash on an unused portSergei Antonov
If the port isn't a CPU port nor a user port, 'cpu_dp' is a null pointer and a crash happened on dereferencing it in mv88e6060_setup_port(): [ 9.575872] Unable to handle kernel NULL pointer dereference at virtual address 00000014 ... [ 9.942216] mv88e6060_setup from dsa_register_switch+0x814/0xe84 [ 9.948616] dsa_register_switch from mdio_probe+0x2c/0x54 [ 9.954433] mdio_probe from really_probe.part.0+0x98/0x2a0 [ 9.960375] really_probe.part.0 from driver_probe_device+0x30/0x10c [ 9.967029] driver_probe_device from __device_attach_driver+0xb8/0x13c [ 9.973946] __device_attach_driver from bus_for_each_drv+0x90/0xe0 [ 9.980509] bus_for_each_drv from __device_attach+0x110/0x184 [ 9.986632] __device_attach from bus_probe_device+0x8c/0x94 [ 9.992577] bus_probe_device from deferred_probe_work_func+0x78/0xa8 [ 9.999311] deferred_probe_work_func from process_one_work+0x290/0x73c [ 10.006292] process_one_work from worker_thread+0x30/0x4b8 [ 10.012155] worker_thread from kthread+0xd4/0x10c [ 10.017238] kthread from ret_from_fork+0x14/0x3c Fixes: 0abfd494deef ("net: dsa: use dedicated CPU port") CC: Vivien Didelot <vivien.didelot@savoirfairelinux.com> CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Sergei Antonov <saproj@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220811070939.1717146-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-12fec: Fix timer capture timing in `fec_ptp_enable_pps()`Csókás Bence
Code reimplements functionality already in `fec_ptp_read()`, but misses check for FEC_QUIRK_BUG_CAPTURE. Replace with function call. Fixes: 28b5f058cf1d ("net: fec: ptp: fix convergence issue to support LinuxPTP stack") Signed-off-by: Csókás Bence <csokas.bence@prolan.hu> Link: https://lore.kernel.org/r/20220811101348.13755-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-12iavf: Fix deadlock in initializationIvan Vecera
Fix deadlock that occurs when iavf interface is a part of failover configuration. 1. Mutex crit_lock is taken at the beginning of iavf_watchdog_task() 2. Function iavf_init_config_adapter() is called when adapter state is __IAVF_INIT_CONFIG_ADAPTER 3. iavf_init_config_adapter() calls register_netdevice() that emits NETDEV_REGISTER event 4. Notifier function failover_event() then calls net_failover_slave_register() that calls dev_open() 5. dev_open() calls iavf_open() that tries to take crit_lock in end-less loop Stack trace: ... [ 790.251876] usleep_range_state+0x5b/0x80 [ 790.252547] iavf_open+0x37/0x1d0 [iavf] [ 790.253139] __dev_open+0xcd/0x160 [ 790.253699] dev_open+0x47/0x90 [ 790.254323] net_failover_slave_register+0x122/0x220 [net_failover] [ 790.255213] failover_slave_register.part.7+0xd2/0x180 [failover] [ 790.256050] failover_event+0x122/0x1ab [failover] [ 790.256821] notifier_call_chain+0x47/0x70 [ 790.257510] register_netdevice+0x20f/0x550 [ 790.258263] iavf_watchdog_task+0x7c8/0xea0 [iavf] [ 790.259009] process_one_work+0x1a7/0x360 [ 790.259705] worker_thread+0x30/0x390 To fix the situation we should check the current adapter state after first unsuccessful mutex_trylock() and return with -EBUSY if it is __IAVF_INIT_CONFIG_ADAPTER. Fixes: 226d528512cf ("iavf: fix locking of critical sections") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-12iavf: Fix reset error handlingPrzemyslaw Patynowski
Do not call iavf_close in iavf_reset_task error handling. Doing so can lead to double call of napi_disable, which can lead to deadlock there. Removing VF would lead to iavf_remove task being stuck, because it requires crit_lock, which is held by iavf_close. Call iavf_disable_vf if reset fail, so that driver will clean up remaining invalid resources. During rapid VF resets, HW can fail to setup VF mailbox. Wrong error handling can lead to iavf_remove being stuck with: [ 5218.999087] iavf 0000:82:01.0: Failed to init adminq: -53 ... [ 5267.189211] INFO: task repro.sh:11219 blocked for more than 30 seconds. [ 5267.189520] Tainted: G S E 5.18.0-04958-ga54ce3703613-dirty #1 [ 5267.189764] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5267.190062] task:repro.sh state:D stack: 0 pid:11219 ppid: 8162 flags:0x00000000 [ 5267.190347] Call Trace: [ 5267.190647] <TASK> [ 5267.190927] __schedule+0x460/0x9f0 [ 5267.191264] schedule+0x44/0xb0 [ 5267.191563] schedule_preempt_disabled+0x14/0x20 [ 5267.191890] __mutex_lock.isra.12+0x6e3/0xac0 [ 5267.192237] ? iavf_remove+0xf9/0x6c0 [iavf] [ 5267.192565] iavf_remove+0x12a/0x6c0 [iavf] [ 5267.192911] ? _raw_spin_unlock_irqrestore+0x1e/0x40 [ 5267.193285] pci_device_remove+0x36/0xb0 [ 5267.193619] device_release_driver_internal+0xc1/0x150 [ 5267.193974] pci_stop_bus_device+0x69/0x90 [ 5267.194361] pci_stop_and_remove_bus_device+0xe/0x20 [ 5267.194735] pci_iov_remove_virtfn+0xba/0x120 [ 5267.195130] sriov_disable+0x2f/0xe0 [ 5267.195506] ice_free_vfs+0x7d/0x2f0 [ice] [ 5267.196056] ? pci_get_device+0x4f/0x70 [ 5267.196496] ice_sriov_configure+0x78/0x1a0 [ice] [ 5267.196995] sriov_numvfs_store+0xfe/0x140 [ 5267.197466] kernfs_fop_write_iter+0x12e/0x1c0 [ 5267.197918] new_sync_write+0x10c/0x190 [ 5267.198404] vfs_write+0x24e/0x2d0 [ 5267.198886] ksys_write+0x5c/0xd0 [ 5267.199367] do_syscall_64+0x3a/0x80 [ 5267.199827] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 5267.200317] RIP: 0033:0x7f5b381205c8 [ 5267.200814] RSP: 002b:00007fff8c7e8c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 5267.201981] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5b381205c8 [ 5267.202620] RDX: 0000000000000002 RSI: 00005569420ee900 RDI: 0000000000000001 [ 5267.203426] RBP: 00005569420ee900 R08: 000000000000000a R09: 00007f5b38180820 [ 5267.204327] R10: 000000000000000a R11: 0000000000000246 R12: 00007f5b383c06e0 [ 5267.205193] R13: 0000000000000002 R14: 00007f5b383bb880 R15: 0000000000000002 [ 5267.206041] </TASK> [ 5267.206970] Kernel panic - not syncing: hung_task: blocked tasks [ 5267.207809] CPU: 48 PID: 551 Comm: khungtaskd Kdump: loaded Tainted: G S E 5.18.0-04958-ga54ce3703613-dirty #1 [ 5267.208726] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.11.0 11/02/2019 [ 5267.209623] Call Trace: [ 5267.210569] <TASK> [ 5267.211480] dump_stack_lvl+0x33/0x42 [ 5267.212472] panic+0x107/0x294 [ 5267.213467] watchdog.cold.8+0xc/0xbb [ 5267.214413] ? proc_dohung_task_timeout_secs+0x30/0x30 [ 5267.215511] kthread+0xf4/0x120 [ 5267.216459] ? kthread_complete_and_exit+0x20/0x20 [ 5267.217505] ret_from_fork+0x22/0x30 [ 5267.218459] </TASK> Fixes: f0db78928783 ("i40evf: use netdev variable in reset task") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-12iavf: Fix NULL pointer dereference in iavf_get_link_ksettingsPrzemyslaw Patynowski
Fix possible NULL pointer dereference, due to freeing of adapter->vf_res in iavf_init_get_resources. Previous commit introduced a regression, where receiving IAVF_ERR_ADMIN_QUEUE_NO_WORK from iavf_get_vf_config would free adapter->vf_res. However, netdev is still registered, so ethtool_ops can be called. Calling iavf_get_link_ksettings with no vf_res, will result with: [ 9385.242676] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 9385.242683] #PF: supervisor read access in kernel mode [ 9385.242686] #PF: error_code(0x0000) - not-present page [ 9385.242690] PGD 0 P4D 0 [ 9385.242696] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI [ 9385.242701] CPU: 6 PID: 3217 Comm: pmdalinux Kdump: loaded Tainted: G S E 5.18.0-04958-ga54ce3703613-dirty #1 [ 9385.242708] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.11.0 11/02/2019 [ 9385.242710] RIP: 0010:iavf_get_link_ksettings+0x29/0xd0 [iavf] [ 9385.242745] Code: 00 0f 1f 44 00 00 b8 01 ef ff ff 48 c7 46 30 00 00 00 00 48 c7 46 38 00 00 00 00 c6 46 0b 00 66 89 46 08 48 8b 87 68 0e 00 00 <f6> 40 08 80 75 50 8b 87 5c 0e 00 00 83 f8 08 74 7a 76 1d 83 f8 20 [ 9385.242749] RSP: 0018:ffffc0560ec7fbd0 EFLAGS: 00010246 [ 9385.242755] RAX: 0000000000000000 RBX: ffffc0560ec7fc08 RCX: 0000000000000000 [ 9385.242759] RDX: ffffffffc0ad4550 RSI: ffffc0560ec7fc08 RDI: ffffa0fc66674000 [ 9385.242762] RBP: 00007ffd1fb2bf50 R08: b6a2d54b892363ee R09: ffffa101dc14fb00 [ 9385.242765] R10: 0000000000000000 R11: 0000000000000004 R12: ffffa0fc66674000 [ 9385.242768] R13: 0000000000000000 R14: ffffa0fc66674000 R15: 00000000ffffffa1 [ 9385.242771] FS: 00007f93711a2980(0000) GS:ffffa0fad72c0000(0000) knlGS:0000000000000000 [ 9385.242775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9385.242778] CR2: 0000000000000008 CR3: 0000000a8e61c003 CR4: 00000000003706e0 [ 9385.242781] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9385.242784] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9385.242787] Call Trace: [ 9385.242791] <TASK> [ 9385.242793] ethtool_get_settings+0x71/0x1a0 [ 9385.242814] __dev_ethtool+0x426/0x2f40 [ 9385.242823] ? slab_post_alloc_hook+0x4f/0x280 [ 9385.242836] ? kmem_cache_alloc_trace+0x15d/0x2f0 [ 9385.242841] ? dev_ethtool+0x59/0x170 [ 9385.242848] dev_ethtool+0xa7/0x170 [ 9385.242856] dev_ioctl+0xc3/0x520 [ 9385.242866] sock_do_ioctl+0xa0/0xe0 [ 9385.242877] sock_ioctl+0x22f/0x320 [ 9385.242885] __x64_sys_ioctl+0x84/0xc0 [ 9385.242896] do_syscall_64+0x3a/0x80 [ 9385.242904] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 9385.242918] RIP: 0033:0x7f93702396db [ 9385.242923] Code: 73 01 c3 48 8b 0d ad 57 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 57 38 00 f7 d8 64 89 01 48 [ 9385.242927] RSP: 002b:00007ffd1fb2bf18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 9385.242932] RAX: ffffffffffffffda RBX: 000055671b1d2fe0 RCX: 00007f93702396db [ 9385.242935] RDX: 00007ffd1fb2bf20 RSI: 0000000000008946 RDI: 0000000000000007 [ 9385.242937] RBP: 00007ffd1fb2bf20 R08: 0000000000000003 R09: 0030763066307330 [ 9385.242940] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd1fb2bf80 [ 9385.242942] R13: 0000000000000007 R14: 0000556719f6de90 R15: 00007ffd1fb2c1b0 [ 9385.242948] </TASK> [ 9385.242949] Modules linked in: iavf(E) xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_compat nf_nat_tftp nft_objref nf_conntrack_tftp bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables rfkill nfnetlink vfat fat irdma ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support ice irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl i40e pcspkr intel_cstate joydev mei_me intel_uncore mxm_wmi mei ipmi_ssif lpc_ich ipmi_si acpi_power_meter xfs libcrc32c mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper sd_mod t10_pi crc64_rocksoft crc64 syscopyarea sg sysfillrect sysimgblt fb_sys_fops drm ixgbe ahci libahci libata crc32c_intel mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [ 9385.243065] [last unloaded: iavf] Dereference happens in if (ADV_LINK_SUPPORT(adapter)) statement Fixes: 209f2f9c7181 ("iavf: Add support for VIRTCHNL_VF_OFFLOAD_VLAN_V2 negotiation") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-12iavf: Fix adminq error handlingPrzemyslaw Patynowski
iavf_alloc_asq_bufs/iavf_alloc_arq_bufs allocates with dma_alloc_coherent memory for VF mailbox. Free DMA regions for both ASQ and ARQ in case error happens during configuration of ASQ/ARQ registers. Without this change it is possible to see when unloading interface: 74626.583369: dma_debug_device_change: device driver has pending DMA allocations while released from device [count=32] One of leaked entries details: [device address=0x0000000b27ff9000] [size=4096 bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent] Fixes: d358aa9a7a2d ("i40evf: init code and hardware support") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-12netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flagPablo Neira Ayuso
If the NFTA_SET_ELEM_OBJREF netlink attribute is present and NFT_SET_OBJECT flag is set on, report EINVAL. Move existing sanity check earlier to validate that NFT_SET_OBJECT requires NFTA_SET_ELEM_OBJREF. Fixes: 8aeff920dcc9 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-12net: lan966x: fix checking for return value of platform_get_irq_byname()Li Qiong
The platform_get_irq_byname() returns non-zero IRQ number or negative error number. "if (irq)" always true, chang it to "if (irq > 0)" Signed-off-by: Li Qiong <liqiong@nfschina.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12net: cxgb3: Fix comment typoJason Wang
The double `the' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12bnx2x: Fix comment typoJason Wang
The double `the' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12net: ipa: Fix comment typoJason Wang
The double `is' is duplicated in the comment, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12virtio_net: fix endian-ness for RSSMichael S. Tsirkin
Using native endian-ness for device supplied fields is wrong on BE platforms. Sparse warns about this. Fixes: 91f41f01d219 ("drivers/net/virtio_net: Added RSS hash report.") Cc: "Andrew Melnychenko" <andrew@daynix.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12net/sunrpc: fix potential memory leaks in rpc_sysfs_xprt_state_change()Xin Xiong
The issue happens on some error handling paths. When the function fails to grab the object `xprt`, it simply returns 0, forgetting to decrease the reference count of another object `xps`, which is increased by rpc_sysfs_xprt_kobj_get_xprt_switch(), causing refcount leaks. Also, the function forgets to check whether `xps` is valid before using it, which may result in NULL-dereferencing issues. Fix it by adding proper error handling code when either `xprt` or `xps` is NULL. Fixes: 5b7eb78486cd ("SUNRPC: take a xprt offline using sysfs") Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12skfp/h: fix repeated words in commentsJilin Yuan
Delete the redundant word 'the'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-12rds: add missing barrier to release_refillMikulas Patocka
The functions clear_bit and set_bit do not imply a memory barrier, thus it may be possible that the waitqueue_active function (which does not take any locks) is moved before clear_bit and it could miss a wakeup event. Fix this bug by adding a memory barrier after clear_bit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-11Merge tag 'net-6.0-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...