summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2010-03-08tcp: Add SNMP counters for backlog and min_ttl dropsEric Dumazet
Commit 6b03a53a (tcp: use limited socket backlog) added the possibility of dropping frames when backlog queue is full. Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the possibility of dropping frames when TTL is under a given limit. This patch adds new SNMP MIB entries, named TCPBacklogDrop and TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop" TCPBacklogDrop: 0 TCPMinTTLDrop: 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07ipv6: Optmize translation between IPV6_PREFER_SRC_xxx and RT6_LOOKUP_F_xxx.YOSHIFUJI Hideaki / 吉藤英明
IPV6_PREFER_SRC_xxx definitions: | #define IPV6_PREFER_SRC_TMP 0x0001 | #define IPV6_PREFER_SRC_PUBLIC 0x0002 | #define IPV6_PREFER_SRC_COA 0x0004 RT6_LOOKUP_F_xxx definitions: | #define RT6_LOOKUP_F_SRCPREF_TMP 0x00000008 | #define RT6_LOOKUP_F_SRCPREF_PUBLIC 0x00000010 | #define RT6_LOOKUP_F_SRCPREF_COA 0x00000020 So, we can translate between these two groups by shift operation instead of multiple 'if's. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05net: backlog functions renameZhu Yi
sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05udp: use limited socket backlogZhu Yi
Make udp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05tcp: use limited socket backlogZhu Yi
Make tcp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04IPv6: fix race between cleanup and add/delete addressstephen hemminger
This solves a potential race problem during the cleanup process. The issue is that addrconf_ifdown() needs to traverse address list, but then drop lock to call the notifier. The version in -next could get confused if add/delete happened during this window. Original code (2.6.32 and earlier) was okay because all addresses were always deleted. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04IPv6: addrconf notify when address is unavailablestephen hemminger
My recent change in net-next to retain permanent addresses caused regression. Device refcount would not go to zero when device was unregistered because left over anycast reference would hold ipv6 dev reference which would hold device references... The correct procedure is to call notify chain when address is no longer available for use. When interface comes back DAD timer will notify back that address is available. Also, link local addresses should be purged when interface is brought down. The address might be changed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04IPv6: addrconf timer racestephen hemminger
The Router Solicitation timer races with device state changes because it doesn't lock the device. Use local variable to avoid one repeated dereference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04IPv6: addrconf dad timer unnecessary bh_disablestephen hemminger
Timer code runs in bottom half, so there is no need for using _bh form of locking. Also check if device is not ready to avoid race with address that is no longer active. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03ipsec: Fix bogus bundle flowiHerbert Xu
When I merged the bundle creation code, I introduced a bogus flowi value in the bundle. Instead of getting from the caller, it was instead set to the flow in the route object, which is totally different. The end result is that the bundles we created never match, and we instead end up with an ever growing bundle list. Thanks to Jamal for find this problem. Reported-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-02-26netfilter: xtables: restore indentationJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-26ipv6: Use 1280 as min MTU for ipv6 forwardingUlrich Weber
Clients will set their MTU to 1280 if they receive a ICMPV6_PKT_TOOBIG message with an MTU less than 1280. To allow encapsulating of packets over a 1280 link we should always accept packets with a size of 1280 for forwarding even if the path has a lower MTU and fragment the encapsulated packets afterwards. In case a forwarded packet is not going to be encapsulated a ICMPV6_PKT_TOOBIG msg will still be send by ip6_fragment() with the correct MTU. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-26ipv6: Remove IPV6_ADDR_RESERVEDUlrich Weber
RFC 4291 section 2.4 states that all uncategorized addresses should be considered as Global Unicast. This will remove IPV6_ADDR_RESERVED completely and return IPV6_ADDR_UNICAST in ipv6_addr_type() instead. Signed-off-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-25Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-02-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-02-24netfilter: xtables: reduce arguments to translate_tableJan Engelhardt
Just pass in the entire repl struct. In case of a new table (e.g. ip6t_register_table), the repldata has been previously filled with table->name and table->size already (in ip6t_alloc_initial_table). Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-24netfilter: xtables: optimize call flow around xt_ematch_foreachJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-24netfilter: xtables: replace XT_MATCH_ITERATE macroJan Engelhardt
The macro is replaced by a list.h-like foreach loop. This makes the code more inspectable. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-24netfilter: xtables: optimize call flow around xt_entry_foreachJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-24netfilter: xtables: replace XT_ENTRY_ITERATE macroJan Engelhardt
The macro is replaced by a list.h-like foreach loop. This makes the code much more inspectable. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-22xfrm: SA lookups signature with markJamal Hadi Salim
pass mark to all SA lookups to prepare them for when we add code to have them search. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-19net: Fix sysctl restarts...Eric W. Biederman
Yuck. It turns out that when we restart sysctls we were restarting with the values already changed. Which unfortunately meant that the second time through we thought there was no change and skipped all kinds of work, despite the fact that there was indeed a change. I have fixed this the simplest way possible by restoring the changed values when we restart the sysctl write. One of my coworkers spotted this bug when after disabling forwarding on an interface pings were still forwarded. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-19netfilter: nf_conntrack_reasm: properly handle packets fragmented into a ↵Patrick McHardy
single fragment When an ICMPV6_PKT_TOOBIG message is received with a MTU below 1280, all further packets include a fragment header. Unlike regular defragmentation, conntrack also needs to "reassemble" those fragments in order to obtain a packet without the fragment header for connection tracking. Currently nf_conntrack_reasm checks whether a fragment has either IP6_MF set or an offset != 0, which makes it ignore those fragments. Remove the invalid check and make reassembly handle fragment queues containing only a single fragment. Reported-and-tested-by: Ulrich Weber <uweber@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-18ipv6: drop unused "dev" arg of icmpv6_send()Alexey Dobriyan
Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: use standard lists for FIB walksAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-18ipv6: remove stale MIB definitionsAlexey Dobriyan
ICMP6 MIB statistics was per-netns for quite a time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17IPv6: convert mc_lock to spinlockStephen Hemminger
Only used for writing, so convert to spinlock Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-17net: remove INIT_RCU_HEAD() usageAlexey Dobriyan
call_rcu() will unconditionally reinitialize RCU head anyway. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16percpu: add __percpu sparse annotations to netTejun Heo
Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
2010-02-16net neigh: Decouple per interface neighbour table controls from binary sysctlsEric W. Biederman
Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16tunnels: fix netns vs proto registration orderingAlexey Dobriyan
Same stuff as in ip_gre patch: receive hook can be called before netns setup is done, oopsing in net_generic(). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16ipcomp: Avoid duplicate calls to ipcomp_destroyHerbert Xu
When ipcomp_tunnel_attach fails we will call ipcomp_destroy twice. This may lead to double-frees on certain structures. As there is no reason to explicitly call ipcomp_destroy, this patch removes it from ipcomp*.c and lets the standard xfrm_state destruction take place. This is based on the discovery and patch by Alexey Dobriyan. Tested-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-16Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
2010-02-15IPv6: Delete redundant counter of IPSTATS_MIB_REASMFAILSShan Wei
When no more memory can be allocated, fq_find() will return NULL and increase the value of IPSTATS_MIB_REASMFAILS. In this case, ipv6_frag_rcv() also increase the value of IPSTATS_MIB_REASMFAILS. So, the patch deletes redundant counter of IPSTATS_MIB_REASMFAILS in fq_find(). and deletes the unused parameter of idev. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-15netfilter: nf_conntrack: add support for "conntrack zones"Patrick McHardy
Normally, each connection needs a unique identity. Conntrack zones allow to specify a numerical zone using the CT target, connections in different zones can use the same identity. Example: iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1 iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1 Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15netfilter: nf_conntrack: pass template to l4proto ->error() handlerPatrick McHardy
The error handlers might need the template to get the conntrack zone introduced in the next patches to perform a conntrack lookup. Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-15netfilter: xtables: add const qualifiersJan Engelhardt
This should make it easier to remove redundant arguments later. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-15netfilter: xtables: constify args in compat copying functionsJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-15netfilter: iptables: remove unused function argumentsJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-12udp: remove redundant variableGerrit Renker
The variable 'copied' is used in udp_recvmsg() to emphasize that the passed 'len' is adjusted to fit the actual datagram length. But the same can be done by adjusting 'len' directly. This patch thus removes the indirection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12IPv6: remove trivial nested _bh suffixstephen hemminger
Don't need to disable bottom half it is already down in the previous lock. Move some blank lines to group locking in same context. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12IPv6: keep permanent addresses on admin downstephen hemminger
Permanent IPV6 addresses should not be removed when the link is set to admin down, only when device is removed. When link is lost permanent addresses should be marked as tentative so that when link comes back they are subject to duplicate address detection (if DAD was enabled for that address). Other routing systems keep manually configured IPv6 addresses when link is set down. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-12ipv6: fib: fix crash when changing large fib while dumping itPatrick McHardy
When the fib size exceeds what can be dumped in a single skb, the dump is suspended and resumed once the last skb has been received by userspace. When the fib is changed while the dump is suspended, the walker might contain stale pointers, causing a crash when the dump is resumed. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] PGD 5347a067 PUD 65c7067 PMD 0 Oops: 0000 [#1] PREEMPT SMP ... RIP: 0010:[<ffffffffa01bce04>] [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] ... Call Trace: [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71 [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6] [<ffffffff81371af4>] netlink_dump+0x5b/0x19e [<ffffffff8134f288>] ? consume_skb+0x28/0x2a [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6 [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151 [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79 [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3 [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38 [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10 [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5 [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f [<ffffffff810ef152>] ? fget_light+0x2f/0xac [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94 [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223 Store the serial number when beginning to walk the fib and reload pointers when continuing to walk after a change occured. Similar to other dumping functions, this might cause unrelated entries to be missed when entries are deleted. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-11netfilter: xtables: fix mangle tablesAlexey Dobriyan
In POST_ROUTING hook, calling dev_net(in) is going to oops. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-10netfilter: xtables: generate initial table on-demandJan Engelhardt
The static initial tables are pretty large, and after the net namespace has been instantiated, they just hang around for nothing. This commit removes them and creates tables on-demand at runtime when needed. Size shrinks by 7735 bytes (x86_64). Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-10netfilter: xtables: use xt_table for hook instantiationJan Engelhardt
The respective xt_table structures already have most of the metadata needed for hook setup. Add a 'priority' field to struct xt_table so that xt_hook_link() can be called with a reduced number of arguments. So should we be having more tables in the future, it comes at no static cost (only runtime, as before) - space saved: 6807373->6806555. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-10netfilter: xtables: compact table hook functions (2/2)Jan Engelhardt
The calls to ip6t_do_table only show minimal differences, so it seems like a good cleanup to merge them to a single one too. Space saving obtained by both patches: 6807725->6807373 ("Total" column from `size -A`.) Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2010-02-10netfilter: xtables: compact table hook functions (1/2)Jan Engelhardt
This patch combines all the per-hook functions in a given table into a single function. Together with the 2nd patch, further simplifications are possible up to the point of output code reduction. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>