summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-24Merge branch 'j7200-support'David S. Miller
Siddharth Vadapalli says: ==================== J7200: CPSW5G: Add support for QSGMII mode to am65-cpsw driver Add support for QSGMII mode to am65-cpsw driver. Change log: v4-> v5: 1. Move ti,j7200-cpswxg-nuss compatible to the line above the ti,j721e-cpsw-nuss compatible. 2. Add allOf and move if-then statements within it to allow future if-then statements to be added easily. v3 -> v4: 1. Update bindings to disallow ports based on compatible, instead of adding a new if/then statement for the new compatible. 2. Add Else-If condition for RMII mode in the set of supported interfaces. Support for RMII mode is already present in the driver and I had missed out adding a condition for RMII mode in the previous patches. v2 -> v3: 1. In ti,k3-am654-cpsw-nuss.yaml, restrict if/then statement to port nodes. v1 -> v2: 1. Add new compatible for CPSW5G in ti,k3-am654-cpsw-nuss.yaml and extend properties for new compatible. 2. Add extra_modes member to struct am65_cpsw_pdata to be used for QSGMII mode by new compatible. 3. Add check for phylink supported modes to ensure that only one phy mode is advertised as supported. 4. Check if extra_modes supports QSGMII mode in am65_cpsw_nuss_mac_config() for register write. 5. Add check for assigning port->sgmii_base only when extra_modes is valid. v4: https://lore.kernel.org/r/20220816060139.111934-1-s-vadapalli@ti.com/ v3: https://lore.kernel.org/r/20220606110443.30362-1-s-vadapalli@ti.com/ v2: https://lore.kernel.org/r/20220602114558.6204-1-s-vadapalli@ti.com/ v1: https://lore.kernel.org/r/20220531113058.23708-1-s-vadapalli@ti.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: ethernet: ti: am65-cpsw: Move phy_set_mode_ext() to correct locationSiddharth Vadapalli
In TI's J7200 SoC CPSW5G ports, each of the 4 ports can be configured as a QSGMII main or QSGMII-SUB port. This configuration is performed by phy-gmii-sel driver on invoking the phy_set_mode_ext() function. It is necessary for the QSGMII main port to be configured before any of the QSGMII-SUB interfaces are brought up. Currently, the QSGMII-SUB interfaces come up before the QSGMII main port is configured. Fix this by moving the call to phy_set_mode_ext() from am65_cpsw_nuss_ndo_slave_open() to am65_cpsw_nuss_init_slave_ports(), thereby ensuring that the QSGMII main port is configured before any of the QSGMII-SUB ports are brought up. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: ethernet: ti: am65-cpsw: Add support for J7200 CPSW5GSiddharth Vadapalli
CPSW5G in J7200 supports additional modes like QSGMII and SGMII. Add new compatible for J7200 and enable QSGMII mode in am65-cpsw driver. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24dt-bindings: net: ti: k3-am654-cpsw-nuss: Update bindings for J7200 CPSW5GSiddharth Vadapalli
Update bindings for TI K3 J7200 SoC which contains 5 ports (4 external ports) CPSW5G module and add compatible for it. Changes made: - Add new compatible ti,j7200-cpswxg-nuss for CPSW5G. - Extend pattern properties for new compatible. - Change maximum number of CPSW ports to 4 for new compatible. Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: neigh: don't call kfree_skb() under spin_lock_irqsave()Yang Yingliang
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once. Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24net: skb: prevent the split of kfree_skb_reason() by gccMenglong Dong
Sometimes, gcc will optimize the function by spliting it to two or more functions. In this case, kfree_skb_reason() is splited to kfree_skb_reason and kfree_skb_reason.part.0. However, the function/tracepoint trace_kfree_skb() in it needs the return address of kfree_skb_reason(). This split makes the call chains becomes: kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() which makes the return address that passed to trace_kfree_skb() be kfree_skb(). Therefore, introduce '__fix_address', which is the combination of '__noclone' and 'noinline', and apply it to kfree_skb_reason() to prevent to from being splited or made inline. (Is it better to simply apply '__noclone oninline' to kfree_skb_reason? I'm thinking maybe other functions have the same problems) Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks it is likely return true and splits kfree_skb_reason(). Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increasesEric Dumazet
Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered. I found this issue while investigating a probable kernel issue causing flakes in tools/testing/selftests/net/ip_defrag.sh In particular, these sysctl changes were ignored: ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1 ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000 >/dev/null 2>&1 This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh in ns be higher than in init_net") Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: fix stuck flows on cleanup due to pending workPablo Neira Ayuso
To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: flowtable: add function to invoke garbage collection immediatelyPablo Neira Ayuso
Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow binding to already bound chainPablo Neira Ayuso
Update nft_data_init() to report EINVAL if chain is already bound. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_tunnel: restrict it to netdev familyPablo Neira Ayuso
Only allow to use this expression from NFPROTO_NETDEV family. Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet familiesPablo Neira Ayuso
As it was originally intended, restrict extension to supported families. Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: do not leave chain stats enabled on errorPablo Neira Ayuso
Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: do not truncate csum_offset and csum_typePablo Neira Ayuso
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nft_payload: report ERANGE for too long offset and lengthPablo Neira Ayuso
Instead of offset and length are truncation to u8, report ERANGE. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: make table handle allocation per-netns friendlyPablo Neira Ayuso
mutex is per-netns, move table_netns to the pernet area. *read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24netfilter: nf_tables: disallow updates of implicit chainPablo Neira Ayuso
Updates on existing implicit chain make no sense, disallow this. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-23Merge tag 'cgroup-for-6.0-rc2-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - The psi data structure was changed to be allocated dynamically but it wasn't being cleared leading to it reporting garbage values and triggering spurious oom kills. - A deadlock involving cpuset and cpu hotplug. - When a controller is moved across cgroup hierarchies, css->rstat_css_node didn't get RCU drained properly from the previous list. * tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix race condition at rebind_subsystems() cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS sched/psi: Remove unused parameter nbytes of psi_trigger_create() sched/psi: Zero the memory of struct psi_group
2022-08-23Merge tag 'audit-pr-20220823' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single fix for a potential double-free on a fsnotify error path" * tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix potential double free on error path from fsnotify_add_inode_mark
2022-08-23Merge tag 'fs.fixes.v6.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull file_remove_privs() fix from Christian Brauner: "As part of Stefan's and Jens' work to add async buffered write support to xfs we refactored file_remove_privs() and added __file_remove_privs() to avoid calling __remove_privs() when IOCB_NOWAIT is passed. While debugging a recent performance regression report I found that during review we missed that commit faf99b563558 ("fs: add __remove_file_privs() with flags parameter") accidently changed behavior when dentry_needs_remove_privs() returns zero. Before the commit it would still call inode_has_no_xattr() setting the S_NOSEC bit and thereby avoiding even calling into dentry_needs_remove_privs() the next time this function is called. After that commit inode_has_no_xattr() would only be called if __remove_privs() had to be called. Restore the old behavior. This is likely the cause of the performance regression" * tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: fs: __file_remove_privs(): restore call to inode_has_no_xattr()
2022-08-23Merge tag 'mlx5-fixes-2022-08-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-08-22 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Unlock on error in mlx5_sriov_enable() net/mlx5e: Fix use after free in mlx5e_fs_init() net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup() net/mlx5: unlock on error path in esw_vfs_changed_event_handler() net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off net/mlx5e: TC, Add missing policer validation net/mlx5e: Fix wrong application of the LRO state net/mlx5: Avoid false positive lockdep warning by adding lock_class_key net/mlx5: Fix cmd error logging for manage pages cmd net/mlx5: Disable irq when locking lag_lock net/mlx5: Eswitch, Fix forwarding decision to uplink net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY net/mlx5e: Properly disable vlan strip on non-UL reps ==================== Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== ice: xsk: reduced queue count fixes Maciej Fijalkowski says: this small series is supposed to fix the issues around AF_XDP usage with reduced queue count on interface. Due to the XDP rings setup, some configurations can result in sockets not seeing traffic flowing. More about this in description of patch 2. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: xsk: use Rx ring's XDP ring when picking NAPI context ice: xsk: prohibit usage of non-balanced queue id ==================== Link: https://lore.kernel.org/r/20220822163257.2382487-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch 'add-interface-mode-select-and-rmii'Jakub Kicinski
Wei Fang says: ==================== add interface mode select and RMII From: Wei Fang <wei.fang@nxp.com> The patches add the below feature support for both TJA1100 and TJA1101 PHYs cards: - Add MII and RMII mode support. - Add REF_CLK input/output support for RMII mode. ==================== Link: https://lore.kernel.org/r/20220822015949.1569969-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23net: phy: tja11xx: add interface mode and RMII REF_CLK supportWei Fang
Add below features support for both TJA1100 and TJA1101 cards: - Add MII and RMII mode support. - Add REF_CLK input/output support for RMII mode. Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23dt-bindings: net: tja11xx: add nxp,refclk_in propertyWei Fang
TJA110x REF_CLK can be configured as interface reference clock intput or output when the RMII mode enabled. This patch add the property to make the REF_CLK can be configurable. Signed-off-by: Wei Fang <wei.fang@nxp.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch 'mlxsw-introduce-modular-system-support-by-minimal-driver'Jakub Kicinski
Petr Machata says: ==================== mlxsw: Introduce modular system support by minimal driver Vadim Pasternak writes: This patchset adds line cards support in mlxsw_minimal, which is used for monitoring purposes on BMC systems. The BMC is connected to the ASIC over I2C bus, unlike the host CPU that is connected to the ASIC via PCI bus. The BMC system needs to be notified whenever line cards become active or inactive, so that, for example, netdevs will be registered / unregistered by mlxsw_minimal. However, traps cannot be generated towards the BMC over the I2C bus. To overcome that, the I2C bus driver (i.e., mlxsw_i2c) registers an handler for an IRQ that is fired upon specific system wide changes, like line card activation and deactivation. The generated event is handled by mlxsw_core, which checks whether anything changed in the state of available line cards. If a line card becomes active or inactive, interested parties such as mlxsw_minimal are notified via their registered line card event callback. Patch set overview: Patches #1 is preparations. Patches #2-#3 extend mlxsw_core with an infrastructure to handle the previously mentioned system events. Patch #4 extends the I2C bus driver to register an handler for the IRQ fired upon specific system wide changes. Patches #5-#8 gradually add line cards support in mlxsw_minimal by dynamically registering / unregistering netdevs for ports found on line cards, whenever a line card becomes active / inactive. ==================== Link: https://lore.kernel.org/r/cover.1661093502.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: minimal: Extend to support line card dynamic operationsVadim Pasternak
Implement line card operation callbacks got_active() / got_inactive(). The purpose of these callback to create / remove line card ports after line card is getting active / inactive. Implement line ports_remove_selected() callback to support line card un-provisioning flow through 'devlink'. Add line card operation registration and de-registration APIs. Add module offset for line card. Offset for main board iz zero. For line card in slot #n offset is calculated as (#n - 1) multiplied by maximum modules number. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: minimal: Extend module to port mapping with slot indexVadim Pasternak
The interfaces for ports found on line card are created and removed dynamically after line card is getting active or inactive. Introduce per line card array with module to port mapping. For each port get 'slot_index' through PMLP register and set port mapping for the relevant [slot_index][module] entry. Split module and port allocation into separate routines. Split per line card port creation and removing into separate routines. Motivation to re-use these routines for line card operations. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: minimal: Move ports allocation to separate routineVadim Pasternak
Perform ports allocation in a separate routine. Motivation is to re-use this routine for ports found on line cards. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: minimal: Extend APIs with slot index for modular system supportVadim Pasternak
Add 'slot_index' field to port structure. Replace zero slot_index argument with 'slot_index' in 'ethtool' related APIs. Add 'slot_index' argument to port initialization and de-initialization related APIs. Motivation is to prepare minimal driver for modular system support. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: i2c: Add support for system interrupt handlingVadim Pasternak
Extend i2c bus driver with interrupt handler to support system specific hotplug events, related to line card state change. Provide system IRQ line for interrupt handler. IRQ line Id could be provided through the platform data if available, or could be set to the default value. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: core_linecards: Register a system event handlerVadim Pasternak
Add line card system event handler. Register it with core. It is triggered by system interrupts raised from chassis programmable logic devices to CPU. The purpose is to handle line card state changes over I2C bus. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: core: Add registration APIs for system event handlerVadim Pasternak
The purpose of system event handler is to handle system interrupts. Such interrupts are raised to CPU from system programmable logic devices, upon specific system wide changes, like line card activation and deactivation. The purpose is to create an alternative to trap mechanism, which delivers these events to driver over PCI bus, but not available for the driver working over I2C bus. Mechanism is system dependent and applicable only for the systems equipped with programmable devices with custom logic. Add APIs for event handler registration and un-registration and API which should be invoked from the registered callbacks when system interrupt is raised to CPU. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23mlxsw: core_linecards: Separate line card init and fini flowVadim Pasternak
Currently, each line card is initialized using the following steps: 1. Initializing its various fields (e.g., slot index). 2. Creating the corresponding devlink object. 3. Enabling events (i.e., traps) for changes in line card status. 4. Querying and processing line card status. Unlike traps, the IRQ that notifies the CPU about line card status changes cannot be enabled / disabled on a per line card basis. If a handler is registered before the line cards are initialized, the handler risks accessing uninitialized memory. On the other hand, if the handler is registered after initialization, we risk missing events. For example, in step 4, the driver might see that a line card is in ready state and will tell the device to enable it. When enablement is done, the line card will be activated and the IRQ will be triggered. Since a handler was not registered, the event will be missed. Solve this by splitting the initialization sequence into two steps (1-2 and 3-4). In a subsequent patch, the handler will be registered between both steps. Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23docs: netlink: basic introduction to NetlinkJakub Kicinski
Provide a bit of a brain dump of netlink related information as documentation. Hopefully this will be useful to people trying to navigate implementing YAML based parsing in languages we won't be able to help with. I started writing this doc while trying to figure out what it'd take to widen the applicability of YAML to good old rtnl, but the doc grew beyond that as it usually happens. In all honesty a lot of this information is new to me as I usually follow the "copy an existing example, drink to forget" process of writing netlink user space, so reviews will be much appreciated. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20220819200221.422801-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23net: improve and fix netlink kdocJakub Kicinski
Subsequent patch will render the kdoc from include/uapi/linux/netlink.h into Documentation. We need to fix the warnings. While at it move the comments on struct nlmsghdr to a proper kdoc comment. Link: https://lore.kernel.org/r/20220819200221.422801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge branch 'bnxt_en-bug-fixes'Jakub Kicinski
Michael Chan says: ==================== bnxt_en: Bug fixes This series includes 2 fixes for regressions introduced by the XDP multi-buffer feature, 1 devlink reload bug fix, and 1 SRIOV resource accounting bug fix. ==================== Link: https://lore.kernel.org/r/1661180814-19350-1-git-send-email-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callbackVikas Gupta
LRO/GRO_HW should be disabled if there is an attached XDP program. BNXT_FLAG_TPA is the current setting of the LRO/GRO_HW. Using BNXT_FLAG_TPA to disable LRO/GRO_HW will cause these features to be permanently disabled once they are disabled. Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23bnxt_en: fix NQ resource accounting during vf creation on 57500 chipsVikas Gupta
There are 2 issues: 1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs with the number of NQs assigned to the VFs. The IRQs are fixed on each function and cannot be re-assigned. Only the NQs are being assigned to the VFs. 2. vf_msix is the total number of NQs to be assigned to the VFs. So we should decrement vf_msix from hw_resc->max_nqs. Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23bnxt_en: set missing reload flag in devlink featuresVikas Gupta
Add missing devlink_set_features() API for callbacks reload_down and reload_up to function. Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in usePavan Chebbi
Using BNXT_PAGE_MODE_BUF_SIZE + offset as buffer length value is not sufficient when running single buffer XDP programs doing redirect operations. The stack will complain on missing skb tail room. Fix it by using PAGE_SIZE when calling xdp_init_buff() for single buffer programs. Fixes: b231c3f3414c ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23net: ftmac100: set max_mtu to allow DSA overhead settingSergei Antonov
In case ftmac100 is used with a DSA switch, Linux wants to set MTU to 1504 to accommodate for DSA overhead. With the default max_mtu it leads to the error message: ftmac100 92000000.mac eth0: error -22 setting MTU to 1504 to include DSA overhead ftmac100 supports packet length 1518 (MAX_PKT_SIZE constant), so it is safe to report it in max_mtu. Signed-off-by: Sergei Antonov <saproj@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220821160844.474277-1-saproj@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23net: dsa: microchip: make learning configurable and keep it off while standaloneVladimir Oltean
Address learning should initially be turned off by the driver for port operation in standalone mode, then the DSA core handles changes to it via ds->ops->port_bridge_flags(). Leaving address learning enabled while ports are standalone breaks any kind of communication which involves port B receiving what port A has sent. Notably it breaks the ksz9477 driver used with a (non offloaded, ports act as if standalone) bonding interface in active-backup mode, when the ports are connected together through external switches, for redundancy purposes. This fixes a major design flaw in the ksz9477 and ksz8795 drivers, which unconditionally leave address learning enabled even while ports operate as standalone. Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477") Link: https://lore.kernel.org/netdev/CAFZh4h-JVWt80CrQWkFji7tZJahMfOToUJQgKS5s0_=9zzpvYQ@mail.gmail.com/ Reported-by: Brian Hutchinson <b.hutchman@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220818164809.3198039-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23Merge tag 'parisc-for-6.0-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Some interesting background to the current patchset: It turned out that the fldw instruction (which loads a 32-bit word from memory into one half of a FP register) failed on unaligned addresses and even trashed some other random FP register instead. It's a trivial one-liner fix in the exception handler but this failure dates back to the very beginnings of the parisc-port. It's strange that it was never noticed before. Another patch fixes an annoyance noticed by Randy Dunlap. Running "make ARCH=parisc64 randconfig" always returned a 32-bit config, although one would expect a 64-bit config. Masahiro Yamada suggested to mimik sparc Kconfig code, which fixed the issue nicely. This allowed to drop some compiler build checks too. Third, it's possible to build an optimized 32-bit kernel for PA8X00 (64-bit) CPUs, which then wouldn't start on 32-bit-only (PA1.x) machines. I've added a bootup check which prevents that and which prints a message to the console. This can be tested with qemu, which currently only supports 32-bit emulation. The other patches are usual clean-up stuff like added return value checks and typo fixes in comments. Summary: - Fix emulation of fldw instruction on unaligned addresses - Fix "make ARCH=parisc64 randconfig" to return a 64-bit config - Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00 CPUs on 32-bit only machines - ccio-dma: Handle kmalloc failure in ccio_init_resources()" * tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources() parisc: led: Move from strlcpy with unused retval to strscpy parisc: ccio-dma: Fix typo in comment Revert "parisc: Show error if wrong 32/64-bit compiler is being used" parisc: Make CONFIG_64BIT available for ARCH=parisc64 only parisc: Fix exception handler for fldw and fstw instructions
2022-08-23Merge tag 'mm-hotfixes-stable-2022-08-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Thirteen fixes, almost all for MM. Seven of these are cc:stable and the remainder fix up the changes which went into this -rc cycle" * tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kprobes: don't call disarm_kprobe() for disabled kprobes mm/shmem: shmem_replace_page() remember NR_SHMEM mm/shmem: tmpfs fallocate use file_modified() mm/shmem: fix chattr fsflags support in tmpfs mm/hugetlb: support write-faults in shared mappings mm/hugetlb: fix hugetlb not supporting softdirty tracking mm/uffd: reset write protection when unregister with wp-mode mm/smaps: don't access young/dirty bit if pte unpresent mm: add DEVICE_ZONE to FOR_ALL_ZONES kernel/sys_ni: add compat entry for fadvise64_64 mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW Revert "zram: remove double compression logic" get_maintainer: add Alan to .get_maintainer.ignore
2022-08-23Merge tag 'linux-kselftest-kunit-fixes-6.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit fixes from Shuah Khan: "Fix for a mmc test and to load .kunit_test_suites section when CONFIG_KUNIT=m, and not just when KUnit is built-in" * tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m mmc: sdhci-of-aspeed: test: Fix dependencies when KUNIT=m
2022-08-23Merge tag 'linux-kselftest-fixes-6.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Fixes to vm and sgx test builds" * tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/vm: fix inability to build any vm tests selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
2022-08-23netfilter: nft_tproxy: restrict to prerouting hookFlorian Westphal
TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this. This fixes a crash (null dereference) when using tproxy from e.g. output. Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-by: Shell Chen <xierch@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-23cgroup: Fix race condition at rebind_subsystems()Jing-Ting Wu
Root cause: The rebind_subsystems() is no lock held when move css object from A list to B list,then let B's head be treated as css node at list_for_each_entry_rcu(). Solution: Add grace period before invalidating the removed rstat_css_node. Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com> Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Jing-Ting Wu <jing-ting.wu@mediatek.com> Tested-by: Jing-Ting Wu <jing-ting.wu@mediatek.com> Link: https://lore.kernel.org/linux-arm-kernel/d8f0bc5e2fb6ed259f9334c83279b4c011283c41.camel@mediatek.com/T/ Acked-by: Mukesh Ojha <quic_mojha@quicinc.com> Fixes: a7df69b81aac ("cgroup: rstat: support cgroup1") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Tejun Heo <tj@kernel.org>
2022-08-23netfilter: conntrack: work around exceeded receive windowFlorian Westphal
When a TCP sends more bytes than allowed by the receive window, all future packets can be marked as invalid. This can clog up the conntrack table because of 5-day default timeout. Sequence of packets: 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1] 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8] 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68 Window is 5840 starting from 33212 -> 39052. 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 This is fine, conntrack will flag the connection as having outstanding data (UNACKED), which lowers the conntrack timeout to 300s. 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 Packet 10 is already sending more than permitted, but conntrack doesn't validate this (only seq is tested vs. maxend, not 'seq+len'). 38484 is acceptable, but only up to 39052, so this packet should not have been sent (or only 568 bytes, not 1318). At this point, connection is still in '300s' mode. Next packet however will get flagged: 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326 nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH .. Now, a couple of replies/acks comes in: 12 initiator > responder: [.], ack 34530, win 4368, [.. irrelevant acks removed ] 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0 This ack is significant -- this acks the last packet send by the responder that conntrack considered valid. This means that ack == td_end. This will withdraw the 'unacked data' flag, the connection moves back to the 5-day timeout of established conntracks. 17 initiator > responder: ack 40128, win 10030, ... This packet is also flagged as invalid. Because conntrack only updates state based on packets that are considered valid, packet 11 'did not exist' and that gets us: nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG Because this received and processed by the endpoints, the conntrack entry remains in a bad state, no packets will ever be considered valid again: 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, .. 31 initiator > responder: [.], ack 40433, win 11348, .. 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 .. ... all trigger 'ACK is over bound' test and we end up with non-early-evictable 5-day default timeout. NB: This patch triggers a bunch of checkpatch warnings because of silly indent. I will resend the cleanup series linked below to reduce the indent level once this change has propagated to net-next. I could route the cleanup via nf but that causes extra backport work for stable maintainers. Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8 Signed-off-by: Florian Westphal <fw@strlen.de>