Age | Commit message (Collapse) | Author |
|
This patch moves the ksz_chip_data in ksz8795 and ksz9477 to ksz_common.
At present, the dev->chip_id is iterated with the ksz_chip_data and then
copy its value to the ksz_dev structure. These values are declared as
constant.
Instead of copying the values and referencing it, this patch update the
dev->info to the ksz_chip_data based on the chip_id in the init
function. And also update the ksz_chip_data values for the LAN937x based
switches.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The port_cnt value in the structure is not used in the switch_init.
Instead it uses the fls(chip->cpu_port), this is due to one of port in
the ksz8794 unavailable. The cpu_port for the 8794 is 0x10, fls(0x10) =
5, hence updating it directly in the ksz_chip_data structure in order to
same with all the other switches in ksz8795.c and ksz9477.c files.
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.
This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.
To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.
An example of such rule would be:
$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0
If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.
logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Argument ndev is not used in mlx5_handle_changeupper_event()
Remove it.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
LAG state machine is implemented using bit flags. However, all these bit
flags, except for MLX5_LAG_FLAG_HASH_BASED, are really mutual exclusive.
In addition, MLX5_LAG_FLAG_READY is used by bonding to mark if we have
our netdevices successfully added to lag and does not really belong in
the same flags variable as the other flags.
Rename MLX5_LAG_FLAG_READY to MLX5_LAG_FLAG_NDEVS_READY to better
reflect its purpose and put it in a new flags variable.
For the rest of the flags, we introduce a mode enum to hold the state
of the LAG.
Remove the shared fdb boolean flag from struct mlx5_lag and store this
configuration as a mode flag.
Change all flag related operations to use standard Linux APIs.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This patch adds the XDP SQs to the uplink representors steering tables
in swichdev mode and enables XDP usage on them.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Correct the calculation of maximum channels of rep to better utilize
the hardware resources and allow a larger scale of reps.
This will allow creation of all virtual ports configured.
Fixes: 473baf2e9e8c ("net/mlx5e: Allow profile-specific limitation on max num of channels")
Signed-off-by: Moshe Tal <moshet@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Connection offload is translated to multiple rules over several
hardware flow tables. Unhandled end-cases may cause a hardware
resource leak causing multiple system symptoms such as a host
memory leak, decreased performance and other scale related issues.
Export the current number of firmware FTEs related to the CT table
as a debugfs counter. Also add a dropped packets counter to help
debug packets dropped on restore failure.
To show the offloaded count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded
To show the dropped count:
cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Roi Dayan <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
|
|
By PCI spec, the config space of the VF always report relaxed ordering
not supported while it inherits this property from its PF. Hence using
pcie_relaxed_ordering_enable(), always disables the relaxed ordering on
all VFs. Remove this check and rely on the firmware which queries the
config space of the PF and set the capability bit accordingly.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Marina Varshaver <marinav@nvidia.com>
Reviewed-by: Gal Shalom <galshalom@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Offloading outer checksum on tunnels requires GSO partial, add it to
'vlan_features' to allow offloading tunnels over vlans.
For example, running GENEVE over vlan & ipv6 (mandatory UDP checksum)
now allows for hardware TSO instead of software segmentation in GSO
only.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Followup commit
79ce39be1d63 ("net/mlx5e: Improve ethtool rxnfc callback structure")
and handle CONFIG_MLX5_EN_RXNFC enabled/disabled inside the fs layer so
the ethtool callbacks are always available. The fs layer will provide
stubs when CONFIG_MLX5_EN_RXNFC is compiled out.
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Physical continuity is not necessary, and requested allocation size might
be larger than PAGE_SIZE.
Hence, use v-alloc/free API.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Take the wrapper version which picks default node into a header file.
This reduces the number of exported functions.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Removing the annotation resolves the issue for some reason.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Added support for adaptive IRQ coalescing. It uses net_dim
algorithm to find the suitable delay/IRQ count based on the
current packet rate.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Link: https://lore.kernel.org/r/20220517044055.876158-1-sumang@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Also removes PEROUT_ENABLE_OUTPUT_MASK
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/1652712427-14703-2-git-send-email-min.li.xe@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY
for gettime and settime exclusively. Before this change,
TOD_READ_PRIMARY was used for both extts and gettime/settime,
which would result in changing TOD read/write triggers between
operations. Using TOD_READ_SECONDARY would make extts
independent of gettime/settime operation
Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/1652712427-14703-1-git-send-email-min.li.xe@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix the following coccicheck warning:
drivers/net/ethernet/smsc/smc911x.c:483:20-22: WARNING opportunity for min()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Link: https://lore.kernel.org/r/20220516115627.66363-1-guozhengkui@vivo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
container_of() will never return NULL, so remove useless code.
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Link: https://lore.kernel.org/r/1652696212-17516-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use memset_startat() helper to simplify the code, there is no functional
change in this patch.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220516092337.131653-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As per Errata Section 5.1, if EEE is intended to be used, some register
writes must be done once after every hardware reset. This patch now adds
the necessary register writes as listed in the Marvell errata.
Without this fix we experience ethernet problems on some of our boards
equipped with a new version of this ethernet PHY (different supplier).
The fix applies to Marvell Alaska 88E1510/88E1518/88E1512/88E1514
Rev. A0.
Signed-off-by: Leszek Polak <lpolak@arri.de>
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Marek Behún <kabel@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220516070859.549170-1-sr@denx.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220516082251.1651350-1-chi.minghao@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220516081914.1651281-1-chi.minghao@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220516072646.1651109-1-chi.minghao@zte.com.cn
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Use eth_zero_addr() to clear mac address instead of memset().
Signed-off-by: Lu Wei <luwei32@huawei.com>
Link: https://lore.kernel.org/r/20220516033343.329178-1-luwei32@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
devm_kfree check the pointer, there is no need to check before
devm_kfree call.
This change is to cleanup the code a bit.
Signed-off-by: Bernard Zhao <bernard@vivo.com>
Link: https://lore.kernel.org/r/20220516015208.6526-1-bernard@vivo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Removed unnecessary:
select COMMON_CLK_SP7021
select RESET_SUNPLUS
select NVMEM_SUNPLUS_OCOTP
from Kconfig.
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Link: https://lore.kernel.org/r/1652443036-24731-1-git-send-email-wellslutw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-05-16
the first 2 patches are by me and target the CAN raw protocol. The 1st
removes an unneeded assignment, the other one adds support for
SO_TXTIME/SCM_TXTIME.
Oliver Hartkopp contributes 2 patches for the ISOTP protocol. The 1st
adds support for transmission without flow control, the other let's
bind() return an error on incorrect CAN ID formatting.
Geert Uytterhoeven contributes a patch to clean up ctucanfd's Kconfig
file.
Vincent Mailhol's patch for the slcan driver uses the proper function
to check for invalid CAN frames in the xmit callback.
The next patch is by Geert Uytterhoeven and makes the interrupt-names
of the renesas,rcar-canfd dt bindings mandatory.
A patch by my update the ctucanfd dt bindings to include the common
CAN controller bindings.
The last patch is by Akira Yokosawa and fixes a breakage the
ctucanfd's documentation.
* tag 'linux-can-next-for-5.19-20220516' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
dt-bindings: can: ctucanfd: include common CAN controller bindings
dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required
can: slcan: slc_xmit(): use can_dropped_invalid_skb() instead of manual check
can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: add support for transmission without flow control
can: raw: add support for SO_TXTIME/SCM_TXTIME
can: raw: raw_sendmsg(): remove not needed setting of skb->sk
====================
Link: https://lore.kernel.org/r/20220516202625.1129281-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb_data_area_size() helper was used to calculate the size of the
DMA mapped buffer passed to the HW. Instead of doing this, use the
size passed to allocate the skbs.
Signed-off-by: Ricardo Martinez <ricardo.martinez@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lock the regmap during the whole PHY register access routines in
rtl8366rb.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220513213618.2742895-1-linus.walleij@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
slcan does a manual check in slc_xmit() to verify if the skb is valid.
This check is incomplete, use instead can_dropped_invalid_skb().
Link: https://lore.kernel.org/all/20220514141650.1109542-2-mailhol.vincent@wanadoo.fr
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
The CTU CAN-FD IP core is only useful when used with one of the
corresponding PCI/PCIe or platform (FPGA, SoC) drivers, which depend on
PCI resp. OF.
Hence make the users select the core driver code, instead of letting
then depend on it. Keep the core code config option visible when
compile-testing, to maintain compile-coverage.
Link: https://lore.kernel.org/all/887b7440446b6244a20a503cc6e8dc9258846706.1652104941.git.geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Now that it is possible to use .probe without having .driver_data, let
KSZ8061 use the kszphy specific hooks for probe,suspend and resume,
which is preferred.
Switch to using the dedicated kszphy probe/suspend/resume functions.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220513114613.762810-2-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, if the .probe element is present in the phy_driver structure
and the .driver_data is not, a NULL pointer dereference happens.
Allow passing .probe without .driver_data by inserting NULL checks
for priv->type.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220513114613.762810-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Calling synchronize_irq() right before free_irq() is quite useless. On one
hand the IRQ can easily fire again before free_irq() is entered, on the
other hand free_irq() itself calls synchronize_irq() internally (in a race
condition free way), before any state associated with the IRQ is freed.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
t7xx_dl_add_timedout() now return int 'ret', but the return type
is bool. Change the return type to int for furthor errcode upstream.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
vfree(NULL) is safe. NULL check before vfree() is not needed.
Delete them to simplify the code.
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
octep_init_module misses destroy_workqueue in error path,
this patch fixes that.
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Works fine on my HP C3600:
[ 274.452394] tulip0: no phy info, aborting mtable build
[ 274.499041] tulip0: MII transceiver #1 config 1000 status 782d advertising 01e1
[ 274.750691] net eth0: Digital DS21142/43 Tulip rev 65 at MMIO 0xf4008000, 00:30:6e:08:7d:21, IRQ 17
[ 283.104520] net eth0: Setting full-duplex based on MII#1 link partner capability of c1e1
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
hinic_pf_to_mgmt_init misses destroy_workqueue in error path,
this patch fixes that.
Fixes: 6dbb89014dc3 ("hinic: fix sending mailbox timeout in aeq event work")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlx5 supports LSOv2.
IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.
We need to ignore/skip this HBH header when populating TX descriptor.
Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
v7: adopt unsafe_memcpy() and MLX5_UNSAFE_MEMCPY_DISCLAIMER
v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mlx4 supports LSOv2 just fine.
IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.
We need to ignore the HBH header when populating TX descriptor.
Tested:
Before: (not enabling bigger TSO/GRO packets)
ip link set dev eth0 gso_max_size 65536 gro_max_size 65536
netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
262144 540000 70000 70000 10.00 6591.45 0.86 1.34 62.490 97.446
262144 540000
After: (enabling bigger TSO/GRO packets)
ip link set dev eth0 gso_max_size 185000 gro_max_size 185000
netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
262144 540000 70000 70000 10.00 8383.95 0.95 1.01 54.432 57.584
262144 540000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set the TSO driver limit to GSO_MAX_SIZE (512 KB).
This allows the admin/user to set a GSO limit up to this value.
ip link set dev veth10 gso_max_size 200000
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set the driver limit to GSO_MAX_SIZE (512 KB).
This allows the admin/user to set a GSO limit up to this value.
Tested:
ip link set dev lo gso_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &
tcpdump shows :
18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow the gro_max_size to exceed a value larger than 65536.
There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.
[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.
With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.
So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.
v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
in a new sk_trim_gso_size() helper.
netif_set_tso_max_size() caps the requested TSO size
with GSO_MAX_SIZE.
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|