summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-08-31IPVS: Add handling of incoming ICMPV6 messagesJulius Volz
Add handling of incoming ICMPv6 messages. This follows the handling of IPv4 ICMP messages. Amongst ther things this problem allows IPVS to behave sensibly when an ICMPV6_PKT_TOOBIG message is received: This message is received when a realserver sends a packet >PMTU to the client. The hop on this path with insufficient MTU will generate an ICMPv6 Packet Too Big message back to the VIP. The LVS server receives this message, but the call to the function handling this has been missing. Thus, IPVS fails to forward the message to the real server, which then does not adjust the path MTU. This patch adds the missing call to ip_vs_in_icmp_v6() in ip_vs_in() to handle this situation. Thanks to Rob Gallagher from HEAnet for reporting this issue and for testing this patch in production (with direct routing mode). [horms@verge.net.au: tweaked changelog] Signed-off-by: Julius Volz <julius.volz@gmail.com> Tested-by: Rob Gallagher <robert.gallagher@heanet.ie> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31netfilter: ip6t_eui: fix read outside array boundsPatrick McHardy
Use memcmp() instead of open coded comparison that reads one byte past the intended end. Based on patch from Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31netfilter: nf_conntrack: netns fix re reliable conntrack event deliveryAlexey Dobriyan
Conntracks in netns other than init_net dying list were never killed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-31ipvs: Use atomic operations atomiclySimon Horman
A pointed out by Shin Hong, IPVS doesn't always use atomic operations in an atomic manner. While this seems unlikely to be manifest in strange behaviour, it seems appropriate to clean this up. Cc: shin hong <hongshin@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25netfilter: nfnetlink: constify message attributes and headersPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25netlink: constify nlmsghdr argumentsPatrick McHardy
Consitfy nlmsghdr arguments to a couple of functions as preparation for the next patch, which will constify the netlink message data in all nfnetlink users. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-25netfilter: nf_conntrack: log packets dropped by helpersPatrick McHardy
Log packets dropped by helpers using the netfilter logging API. This is useful in combination with nfnetlink_log to analyze those packets in userspace for debugging. Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24netfilter: bridge: refcount fixEric Dumazet
commit f216f082b2b37c4943f1e7c393e2786648d48f6f ([NETFILTER]: bridge netfilter: deal with martians correctly) added a refcount leak on in_dev. Instead of using in_dev_get(), we can use __in_dev_get_rcu(), as netfilter hooks are running under rcu_read_lock(), as pointed by Patrick. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24netfilter: nf_nat: fix inverted logic for persistent NAT mappingsMaximilian Engelhardt
Kernel 2.6.30 introduced a patch [1] for the persistent option in the netfilter SNAT target. This is exactly what we need here so I had a quick look at the code and noticed that the patch is wrong. The logic is simply inverted. The patch below fixes this. Also note that because of this the default behavior of the SNAT target has changed since kernel 2.6.30 as it now ignores the destination IP in choosing the source IP for nating (which should only be the case if the persistent option is set). [1] http://git.eu.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=98d500d66cb7940747b424b245fc6a51ecfbf005 Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-24netfilter: xtables: mark initial tables constantJan Engelhardt
The inputted table is never modified, so should be considered const. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-10Merge branch 'master' of git://dev.medozas.de/linuxPatrick McHardy
2009-08-10netfilter: xtables: check for standard verdicts in policiesJan Engelhardt
This adds the second check that Rusty wanted to have a long time ago. :-) Base chain policies must have absolute verdicts that cease processing in the table, otherwise rule execution may continue in an unexpected spurious fashion (e.g. next chain that follows in memory). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: check for unconditionality of policiesJan Engelhardt
This adds a check that iptables's original author Rusty set forth in a FIXME comment. Underflows in iptables are better known as chain policies, and are required to be unconditional or there would be a stochastical chance for the policy rule to be skipped if it does not match. If that were to happen, rule execution would continue in an unexpected spurious fashion. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: ignore unassigned hooks in check_entry_size_and_hooksJan Engelhardt
The "hook_entry" and "underflow" array contains values even for hooks not provided, such as PREROUTING in conjunction with the "filter" table. Usually, the values point to whatever the next rule is. For the upcoming unconditionality and underflow checking patches however, we must not inspect that arbitrary rule. Skipping unassigned hooks seems like a good idea, also because newinfo->hook_entry and newinfo->underflow will then continue to have the poison value for detecting abnormalities. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: use memcmp in unconditional checkJan Engelhardt
Instead of inspecting each u32/char open-coded, clean up and make use of memcmp. On some arches, memcmp is implemented as assembly or GCC's __builtin_memcmp which can possibly take advantages of known alignment. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: iptables: remove unused datalen variableJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: realign struct xt_target_paramJan Engelhardt
This commit gets rid of a padding hole as reported by pahole(1). Saves 8 bytes on x86_64. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: switch table AFs to nfprotoJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: switch hook PFs to nfprotoJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: conntrack: switch hook PFs to nfprotoJan Engelhardt
Simple substitution to indicate that the fields indeed use the NFPROTO_ space. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove redirecting header filesJan Engelhardt
When IPv4 and IPv6 matches were unified approx. 3.5 years ago, they received new header filenames (e.g. xt_CLASSIFY.h). Let's remove the old ones now. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_owner v0Jan Engelhardt
Superseded by xt_owner v1 (v2.6.24-2388-g0265ab4). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_mark v0Jan Engelhardt
Superseded by xt_mark v1 (v2.6.24-2922-g17b0d7e). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_iprange v0Jan Engelhardt
Superseded by xt_iprange v1 (v2.6.24-2928-g1a50c5a1). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_conntrack v0Jan Engelhardt
Superseded by xt_conntrack v1 (v2.6.24-2921-g64eb12f). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_connmark v0Jan Engelhardt
Superseded by xt_connmark v1 (v2.6.24-2919-g96e3227). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_MARK v0, v1Jan Engelhardt
Superseded by xt_MARK v2 (v2.6.24-2918-ge0a812a). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_CONNMARK v0Jan Engelhardt
Superseded by xt_CONNMARK v1 (v2.6.24-2917-g0dc8c76). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: xtables: remove xt_TOS v0Jan Engelhardt
Superseded by xt_TOS v1 (v2.6.24-2396-g5c350e5). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-08-10netfilter: ebtables: Use %pM conversion specifierTobias Klauser
ebt_log uses its own implementation of print_mac to print MAC addresses. This patch converts it to use the %pM conversion specifier for printk. Signed-off-by: Tobias Klauser <klto@zhaw.ch> Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-10netfilter: nf_conntrack: add SCTP support for SO_ORIGINAL_DSTRafael Laufer
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-08-06net: Avoid enqueuing skb for default qdiscsKrishna Kumar
dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size | ORG BW NEW BW | ---------------------------------- 128K | 156964 159381 | 256K | 158650 162042 | ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06mlx4_en: Not using Shared Receive QueuesYevgeny Petrilin
We use 1:1 mapping between QPs and SRQs on receive side, so additional indirection level not required. Allocated the receive buffers for the RSS QPs. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06mlx4_en: Using real number of rings as RSS map sizeYevgeny Petrilin
There is no point in using more QPs then actual number of receive rings. If the RSS function for two streams gives the same result modulo number of rings, they will arrive to the same RX ring anyway. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06mlx4_en: Adaptive moderation policy changeYevgeny Petrilin
If the net device is identified as "sender" (number of sent packets is higher then the number of received packets and the incoming packets are small), set the moderation time to its low limit. We do it because the incoming packets are acks, and we don't want to delay them Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06net: smsc911x: switch to new dev_pm_opsDaniel Mack
Hibernation is unsupported for now, which meets the actual implementation in the driver. For free/thaw, the chip's D2 state should be entered. Signed-off-by: Daniel Mack <daniel@caiaq.de> Acked-by: <steve.glendinning@smsc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06tc35815: Use 0 RxFragSize.MinFrag value for non-packing modeAtsushi Nemoto
The datasheet say "When not enabling packing, the MinFrag value must remain at 0". Do not set value to RxFragSize register if TC35815_USE_PACKEDBUFFER disabled. This is not a bugfix. No real problem reported on this. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06tc35815: Fix rx_missed_errors countAtsushi Nemoto
The Miss_Cnt register is cleared by reading. Accumulate its value to rx_missed_errors count. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06tc35815: Increase timeout for mdioAtsushi Nemoto
The current timeout value is too short for very high-load condition which jiffies might jump up in busy-loop. Also add minimum delay before checking completion of MDIO. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06tc35815: Improve BLEx / FDAEx handlingAtsushi Nemoto
Clear Int_BLEx / Int_FDAEx after (not before) processing Rx interrupt. This will reduce number of unnecessary interrupts. Also print rx error messages only if netif_msg_rx_err() enabled. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06tc35815: Disable StripCRCAtsushi Nemoto
It seems Rx_StripCRC cause trouble on recovering from the BLEx (Buffer List Exhaust) or FDAEx (Free Descriptor Area Exhaust) condition. Do not use it. Also bump version number up. Reported-by: Ralf Roesch <ralf.roesch@rw-gmbh.de> Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
2009-08-05net/rds: Use AF_INET for sin_family fieldJulia Lawall
Elsewhere the sin_family field holds a value with a name of the form AF_..., so it seems reasonable to do so here as well. Also the values of PF_INET and AF_INET are the same. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ struct sockaddr_in sip; @@ ( sip.sin_family == - PF_INET + AF_INET | sip.sin_family != - PF_INET + AF_INET | sip.sin_family = - PF_INET + AF_INET ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05cxgb3: fix 2 ports 1G regressionDivy Le Ray
commit 88045b3cf0f8981129cb489c7b6bc36c21dd33a7 cxgb3: fix mac index mapping Override the mac index computation for the gen2 adapter, as each port is expected to use index 0. introduces a regression on 2 port 1G adapter as its xauicfg vpd value is null. Add a check on the device id. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: Version updateBenjamin LaHaise
Update the driver version number for any bug reports from end users. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: prefetch skb->dataBenjamin LaHaise
This patch implements prefetching of skb->data from a copy of the pointer in the descriptor (which is already in the L1 cache at this point). This improves netperf rx performance (netperf -L 0,0 -c -H 192.168.254.2 -- -M 131072 -m 131072) by 4.9% on a P4 Xeon host. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: prefetch RxD descriptorsBenjamin LaHaise
This patch prefetches RxD descriptors which helps to lower the latency of a cache miss in vxge_hw_ring_rxd_next_completed. This lowers the % of CPU time used by vxge_hw_ring_rxd_next_completed() where the descriptor is accessed in profiling netperf on a P4 Xeon from 1.5% to 1.0%. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: improve tx performance by using mmiowb() instead of wmb()Benjamin LaHaise
wmb() is extremely heavy on x86. The semantics required in the driver are provided by mmiowb(), so use that and improve tx performance on P4 Xeons by 5-10%. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: fix pktgen hangs (don't abuse skb->cb[])Benjamin LaHaise
This patch fixes a case in the transmit completion code which was resulting in pktgen hanging at the end of a run. The cause is due to the fact that the ->cb[] area of an skb cannot be used in a network driver's transmit path, as that area belongs to the network protocol. Pktgen hangs, as it sends out the same packet multiple times, and vxge's use of this area of the skb for a temporary list can only add the packet to the temporary list once (while it may be on the queue many times). The fix is to remove this abuse of skb->cb[]. Instead, skb pointers are placed into a temporary stack array, and then free outside of the tx lock. This retains the smp optimization of doing dev_kfree_skb() outside of the tx lock. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-05vxge: don't drop frame on tx queue fullBenjamin LaHaise
The vxge driver will drop a packet in its transmit function if the number of TxDs available hits 0. Instead of doing that, simply stop the transmit queue when transmitting a packet with the last available TxD. Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com> Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com> Signed-off-by: David S. Miller <davem@davemloft.net>