summaryrefslogtreecommitdiff
path: root/net/socket.c
AgeCommit message (Collapse)Author
2020-12-16Merge tag 'for-5.11/io_uring-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Fairly light set of changes this time around, and mostly some bits that were pushed out to 5.11 instead of 5.10, fixes/cleanups, and a few features. In particular: - Cleanups around iovec import (David Laight, Pavel) - Add timeout support for io_uring_enter(2), which enables us to clean up liburing and avoid a timeout sqe submission in the completion path. The big win here is that it allows setups that split SQ and CQ handling into separate threads to avoid locking, as the CQ side will no longer submit when timeouts are needed when waiting for events (Hao Xu) - Add support for socket shutdown, and renameat/unlinkat. - SQPOLL cleanups and improvements (Xiaoguang Wang) - Allow SQPOLL setups for CAP_SYS_NICE, and enable regular (non-fixed) files to be used. - Cancelation improvements (Pavel) - Fixed file reference improvements (Pavel) - IOPOLL related race fixes (Pavel) - Lots of other little fixes and cleanups (mostly Pavel)" * tag 'for-5.11/io_uring-2020-12-14' of git://git.kernel.dk/linux-block: (43 commits) io_uring: fix io_cqring_events()'s noflush io_uring: fix racy IOPOLL flush overflow io_uring: fix racy IOPOLL completions io_uring: always let io_iopoll_complete() complete polled io io_uring: add timeout update io_uring: restructure io_timeout_cancel() io_uring: fix files cancellation io_uring: use bottom half safe lock for fixed file data io_uring: fix miscounting ios_left io_uring: change submit file state invariant io_uring: check kthread stopped flag when sq thread is unparked io_uring: share fixed_file_refs b/w multiple rsrcs io_uring: replace inflight_wait with tctx->wait io_uring: don't take fs for recvmsg/sendmsg io_uring: only wake up sq thread while current task is in io worker context io_uring: don't acquire uring_lock twice io_uring: initialize 'timeout' properly in io_sq_thread() io_uring: refactor io_sq_thread() handling io_uring: always batch cancel in *cancel_files() io_uring: pass files into kill timeouts/poll ...
2020-12-04net: Remove the err argument from sock_from_fileFlorent Revest
Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com
2020-11-23net: don't include ethtool.h from netdevice.hJakub Kicinski
linux/netdevice.h is included in very many places, touching any of its dependecies causes large incremental builds. Drop the linux/ethtool.h include, linux/netdevice.h just needs a forward declaration of struct ethtool_ops. Fix all the places which made use of this implicit include. Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Shannon Nelson <snelson@pensando.io> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20201120225052.1427503-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-23net: provide __sys_shutdown_sock() that takes a socketJens Axboe
No functional changes in this patch, needed to provide io_uring support for shutdown(2). Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-11-17net: wan: Delete the DLCI / SDLA driversXie He
The DLCI driver (dlci.c) implements the Frame Relay protocol. However, we already have another newer and better implementation of Frame Relay provided by the HDLC_FR driver (hdlc_fr.c). The DLCI driver's implementation of Frame Relay is used by only one hardware driver in the kernel - the SDLA driver (sdla.c). The SDLA driver provides Frame Relay support for the Sangoma S50x devices. However, the vendor provides their own driver (along with their own multi-WAN-protocol implementations including Frame Relay), called WANPIPE. I believe most users of the hardware would use the vendor-provided WANPIPE driver instead. (The WANPIPE driver was even once in the kernel, but was deleted in commit 8db60bcf3021 ("[WAN]: Remove broken and unmaintained Sangoma drivers.") because the vendor no longer updated the in-kernel WANPIPE driver.) Cc: Mike McLagan <mike.mclagan@linux.org> Signed-off-by: Xie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201114150921.685594-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: add WARN_ONCE in kernel_sendpage() for improper zero-copy sendColy Li
If a page sent into kernel_sendpage() is a slab page or it doesn't have ref_count, this page is improper to send by the zero copy sendpage() method. Otherwise such page might be unexpected released in network code path and causes impredictable panic due to kernel memory management data structure corruption. This path adds a WARN_ON() on the sending page before sends it into the concrete zero-copy sendpage() method, if the page is improper for the zero-copy sendpage() method, a warning message can be observed before the consequential unpredictable kernel panic. This patch does not change existing kernel_sendpage() behavior for the improper page zero-copy send, it just provides hint warning message for following potential panic due the kernel memory heap corruption. Signed-off-by: Coly Li <colyli@suse.de> Cc: Cong Wang <amwang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David S. Miller <davem@davemloft.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
We got slightly different patches removing a double word in a comment in net/ipv4/raw.c - picked the version from net. Simple conflict in drivers/net/ethernet/ibm/ibmvnic.c. Use cached values instead of VNIC login response buffer (following what commit 507ebe6444a4 ("ibmvnic: Fix use-after-free of VNIC login response buffer") did). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-08-27net: Fix some commentsMiaohe Lin
Fix some comments, including wrong function name, duplicated word and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-24io_uring: allow tcp ancillary data for __sys_recvmsg_sock()Luke Hsiao
For TCP tx zero-copy, the kernel notifies the process of completions by queuing completion notifications on the socket error queue. This patch allows reading these notifications via recvmsg to support TCP tx zero-copy. Ancillary data was originally disallowed due to privilege escalation via io_uring's offloading of sendmsg() onto a kernel thread with kernel credentials (https://crbug.com/project-zero/1975). So, we must ensure that the socket type is one where the ancillary data types that are delivered on recvmsg are plain data (no file descriptors or values that are translated based on the identity of the calling process). This was tested by using io_uring to call recvmsg on the MSG_ERRQUEUE with tx zero-copy enabled. Before this patch, we received -EINVALID from this specific code path. After this patch, we could read tcp tx zero-copy completion notifications from the MSG_ERRQUEUE. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Arjun Roy <arjunroy@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces"Christoph Hellwig
This reverts commits 6d04fe15f78acdf8e32329e208552e226f7a8ae6 and a31edb2059ed4e498f9aa8230c734b59d0ad797a. It turns out the idea to share a single pointer for both kernel and user space address causes various kinds of problems. So use the slightly less optimal version that uses an extra bit, but which is guaranteed to be safe everywhere. Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Convert to use the fallthrough macroMiaohe Lin
Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Remove meaningless jump label out_fsMiaohe Lin
The out_fs jump label has nothing to do but goto out. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Set fput_needed iff FDPUT_FPUT is setMiaohe Lin
We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef6a7e ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Use helper function fdput()Miaohe Lin
Use helper function fdput() to fput() the file iff FDPUT_FPUT is set. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28net: improve the user pointer check in init_user_sockptrChristoph Hellwig
Make sure not just the pointer itself but the whole range lies in the user address space. For that pass the length and then use the access_ok helper to do the check. Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: optimize the sockptr_t for unified kernel/user address spacesChristoph Hellwig
For architectures like x86 and arm64 we don't need the separate bit to indicate that a pointer is a kernel pointer as the address spaces are unified. That way the sockptr_t can be reduced to a union of two pointers, which leads to nicer calling conventions. The only caveat is that we need to check that users don't pass in kernel address and thus gain access to kernel memory. Thus the USER_SOCKPTR helper is replaced with a init_user_sockptr function that does this check and returns an error if it fails. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: pass a sockptr_t into ->setsockoptChristoph Hellwig
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: switch sock_set_timeout to sockptr_tChristoph Hellwig
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: make ->{get,set}sockopt in proto_ops optionalChristoph Hellwig
Just check for a NULL method instead of wiring up sock_no_{get,set}sockopt. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: remove compat_sys_{get,set}sockoptChristoph Hellwig
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: streamline __sys_getsockoptChristoph Hellwig
Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: streamline __sys_setsockoptChristoph Hellwig
Return early when sockfd_lookup_light fails to reduce a level of indentation for most of the function body. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13net: socket: Move kerneldoc next to function it documentsAndrew Lunn
Fix the warning "Function parameter or member 'inode' not described in '__sock_release'' due to the kerneldoc being placed before __sock_release() not sock_release(), which does not take an inode parameter. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-04net: use mptcp setsockopt function for SOL_SOCKET on mptcp socketsFlorian Westphal
setsockopt(mptcp_fd, SOL_SOCKET, ...)... appears to work (returns 0), but it has no effect -- this is because the MPTCP layer never has a chance to copy the settings to the subflow socket. Skip the generic handling for the mptcp case and instead call the mptcp specific handler instead for SOL_SOCKET too. Next patch adds more specific handling for SOL_SOCKET to mptcp. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-29net: remove kernel_setsockoptChristoph Hellwig
No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-27net: remove kernel_getsockoptChristoph Hellwig
No users left. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18ipv4,appletalk: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig
To prepare removing the global routing_ioctl hack start lifting the code into the ipv4 and appletalk ->compat_ioctl handlers. Unlike the existing handler we don't bother copying in the name - there are no compat issues for char arrays. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-18ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-11net: cleanly handle kernel vs user buffers for ->msg_controlChristoph Hellwig
The msg_control field in struct msghdr can either contain a user pointer when used with the recvmsg system call, or a kernel pointer when used with sendmsg. To complicate things further kernel_recvmsg can stuff a kernel pointer in and then use set_fs to make the uaccess helpers accept it. Replace it with a union of a kernel pointer msg_control field, and a user pointer msg_control_user one, and allow kernel_recvmsg operate on a proper kernel pointer using a bitfield to override the normal choice of a user pointer for recvmsg. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the io_uring changes for this merge window. Light on new features this time around (just splice + buffer selection), lots of cleanups, fixes, and improvements to existing support. In particular, this contains: - Cleanup fixed file update handling for stack fallback (Hillf) - Re-work of how pollable async IO is handled, we no longer require thread offload to handle that. Instead we rely using poll to drive this, with task_work execution. - In conjunction with the above, allow expendable buffer selection, so that poll+recv (for example) no longer has to be a split operation. - Make sure we honor RLIMIT_FSIZE for buffered writes - Add support for splice (Pavel) - Linked work inheritance fixes and optimizations (Pavel) - Async work fixes and cleanups (Pavel) - Improve io-wq locking (Pavel) - Hashed link write improvements (Pavel) - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)" * tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits) io_uring: cleanup io_alloc_async_ctx() io_uring: fix missing 'return' in comment io-wq: handle hashed writes in chains io-uring: drop 'free_pfile' in struct io_file_put io-uring: drop completion when removing file io_uring: Fix ->data corruption on re-enqueue io-wq: close cancel gap for hashed linked work io_uring: make spdxcheck.py happy io_uring: honor original task RLIMIT_FSIZE io-wq: hash dependent work io-wq: split hashing and enqueueing io-wq: don't resched if there is no work io-wq: remove duplicated cancel code io_uring: fix truncated async read/readv and write/writev retry io_uring: dual license io_uring.h uapi header io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled io_uring: Fix unused function warnings io_uring: add end-of-bits marker and build time verify it io_uring: provide means of removing buffers io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG ...
2020-03-20io_uring: make sure accept honor rlimit nofileJens Axboe
Just like commit 4022e7af86be, this fixes the fact that IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks current->signal->rlim[] for limits. Add an extra argument to __sys_accept4_file() that allows us to pass in the proper nofile limit, and grab it at request prep time. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-10net: abstract out normal and compat msghdr importJens Axboe
This splits it into two parts, one that imports the message, and one that imports the iovec. This allows a caller to only do the first part, and import the iovec manually afterwards. No functional changes in this patch. Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-08socket: fix unused-function warningArnd Bergmann
When procfs is disabled, the fdinfo code causes a harmless warning: net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function] static void sock_show_fdinfo(struct seq_file *m, struct file *f) Move the function definition up so we can use a single #ifdef around it. Fixes: b4653342b151 ("net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Mere overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-13Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that don't sever if the result is < 0. This is mostly for linked timeouts, where if we ask for a pure timeout we always get -ETIME. This makes links useless for that case, hence allow a case where it works. - Five minor optimizations to fix and improve cases that regressed since v5.4. - An SQTHREAD locking fix. - A sendmsg/recvmsg iov assignment fix. - Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and subsequently ensuring that works for io_uring. - Fix a case where for an invalid opcode we might return -EBADF instead of -EINVAL, if the ->fd of that sqe was set to an invalid fd value. * tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block: io_uring: ensure we return -EINVAL on unknown opcode io_uring: add sockets to list of files that support non-blocking issue net: make socket read/write_iter() honor IOCB_NOWAIT io_uring: only hash regular files for async work execution io_uring: run next sqe inline if possible io_uring: don't dynamically allocate poll data io_uring: deferred send/recvmsg should assign iov io_uring: sqthread should grab ctx->uring_lock for submissions io-wq: briefly spin for new work after finishing work io-wq: remove worker->wait waitqueue io_uring: allow unbreakable links
2019-12-12net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]Kirill Tkhai
This adds .show_fdinfo to socket_file_ops, so protocols will be able to print their specific data in fdinfo. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10net: make socket read/write_iter() honor IOCB_NOWAITJens Axboe
The socket read/write helpers only look at the file O_NONBLOCK. not the iocb IOCB_NOWAIT flag. This breaks users like preadv2/pwritev2 and io_uring that rely on not having the file itself marked nonblocking, but rather the iocb itself. Cc: netdev@vger.kernel.org Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) More jumbo frame fixes in r8169, from Heiner Kallweit. 2) Fix bpf build in minimal configuration, from Alexei Starovoitov. 3) Use after free in slcan driver, from Jouni Hogander. 4) Flower classifier port ranges don't work properly in the HW offload case, from Yoshiki Komachi. 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin. 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk. 7) Fix flow dissection in dsa TX path, from Alexander Lobakin. 8) Stale syncookie timestampe fixes from Guillaume Nault. [ Did an evil merge to silence a warning introduced by this pull - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) r8169: fix rtl_hw_jumbo_disable for RTL8168evl net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add() r8169: add missing RX enabling for WoL on RTL8125 vhost/vsock: accept only packets with the right dst_cid net: phy: dp83867: fix hfs boot in rgmii mode net: ethernet: ti: cpsw: fix extra rx interrupt inet: protect against too small mtu values. gre: refetch erspan header from skb->data after pskb_may_pull() pppoe: remove redundant BUG_ON() check in pppoe_pernet tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE() tcp: tighten acceptance of ACKs not matching a child socket tcp: fix rejected syncookies due to stale timestamps lpc_eth: kernel BUG on remove tcp: md5: fix potential overestimation of TCP option space net: sched: allow indirect blocks to bind to clsact in TC net: core: rename indirect block ingress cb function net-sysfs: Call dev_hold always in netdev_queue_add_kobject net: dsa: fix flow dissection on Tx path net/tls: Fix return values to avoid ENOTSUPP net: avoid an indirect call in ____sys_recvmsg() ...
2019-12-06net: avoid an indirect call in ____sys_recvmsg()Eric Dumazet
CONFIG_RETPOLINE=y made indirect calls expensive. gcc seems to add an indirect call in ____sys_recvmsg(). Rewriting the code slightly makes sure to avoid this indirection. Alternative would be to not call sock_recvmsg() and instead use security_socket_recvmsg() and sock_recvmsg_nosec(), but this is less readable IMO. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Laight <David.Laight@aculab.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-03io_uring: ensure async punted connect requests copy dataJens Axboe
Just like commit f67676d160c6 for read/write requests, this one ensures that the sockaddr data has been copied for IORING_OP_CONNECT if we need to punt the request to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03io_uring: ensure async punted sendmsg/recvmsg requests copy dataJens Axboe
Just like commit f67676d160c6 for read/write requests, this one ensures that the msghdr data is fully copied if we need to punt a recvmsg or sendmsg system call to async context. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-01Merge tag 'y2038-cleanups-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 cleanups from Arnd Bergmann: "y2038 syscall implementation cleanups This is a series of cleanups for the y2038 work, mostly intended for namespace cleaning: the kernel defines the traditional time_t, timeval and timespec types that often lead to y2038-unsafe code. Even though the unsafe usage is mostly gone from the kernel, having the types and associated functions around means that we can still grow new users, and that we may be missing conversions to safe types that actually matter. There are still a number of driver specific patches needed to get the last users of these types removed, those have been submitted to the respective maintainers" Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/ * tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits) y2038: alarm: fix half-second cut-off y2038: ipc: fix x32 ABI breakage y2038: fix typo in powerpc vdso "LOPART" y2038: allow disabling time32 system calls y2038: itimer: change implementation to timespec64 y2038: move itimer reset into itimer.c y2038: use compat_{get,set}_itimer on alpha y2038: itimer: compat handling to itimer.c y2038: time: avoid timespec usage in settimeofday() y2038: timerfd: Use timespec64 internally y2038: elfcore: Use __kernel_old_timeval for process times y2038: make ns_to_compat_timeval use __kernel_old_timeval y2038: socket: use __kernel_old_timespec instead of timespec y2038: socket: remove timespec reference in timestamping y2038: syscalls: change remaining timeval to __kernel_old_timeval y2038: rusage: use __kernel_old_timeval y2038: uapi: change __kernel_time_t to __kernel_old_time_t y2038: stat: avoid 'time_t' in 'struct stat' y2038: ipc: remove __kernel_time_t reference from headers y2038: vdso: powerpc: avoid timespec references ...
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-26net: disallow ancillary data for __sys_{send,recv}msg_file()Jens Axboe
Only io_uring uses (and added) these, and we want to disallow the use of sendmsg/recvmsg for anything but regular data transfers. Use the newly added prep helper to split the msghdr copy out from the core function, to check for msg_control and msg_controllen settings. If either is set, we return -EINVAL. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26net: separate out the msghdr copy from ___sys_{send,recv}msg()Jens Axboe
This is in preparation for enabling the io_uring helpers for sendmsg and recvmsg to first copy the header for validation before continuing with the operation. There should be no functional changes in this patch. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25net: add __sys_connect_file() helperJens Axboe
This is identical to __sys_connect(), except it takes a struct file instead of an fd, and it also allows passing in extra file->f_flags flags. The latter is done to support masking in O_NONBLOCK without manipulating the original file flags. No functional changes in this patch. Cc: netdev@vger.kernel.org Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25Merge tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "A lot of stuff has been going on this cycle, with improving the support for networked IO (and hence unbounded request completion times) being one of the major themes. There's been a set of fixes done this week, I'll send those out as well once we're certain we're fully happy with them. This contains: - Unification of the "normal" submit path and the SQPOLL path (Pavel) - Support for sparse (and bigger) file sets, and updating of those file sets without needing to unregister/register again. - Independently sized CQ ring, instead of just making it always 2x the SQ ring size. This makes it more flexible for networked applications. - Support for overflowed CQ ring, never dropping events but providing backpressure on submits. - Add support for absolute timeouts, not just relative ones. - Support for generic cancellations. This divorces io_uring from workqueues as well, which additionally gets us one step closer to generic async system call support. - With cancellations, we can support grabbing the process file table as well, just like we do mm context. This allows support for system calls that create file descriptors, like accept4() support that's built on top of that. - Support for io_uring tracing (Dmitrii) - Support for linked timeouts. These abort an operation if it isn't completed by the time noted in the linke timeout. - Speedup tracking of poll requests - Various cleanups making the coder easier to follow (Jackie, Pavel, Bob, YueHaibing, me) - Update MAINTAINERS with new io_uring list" * tag 'for-5.5/io_uring-20191121' of git://git.kernel.dk/linux-block: (64 commits) io_uring: make POLL_ADD/POLL_REMOVE scale better io-wq: remove now redundant struct io_wq_nulls_list io_uring: Fix getting file for non-fd opcodes io_uring: introduce req_need_defer() io_uring: clean up io_uring_cancel_files() io-wq: ensure free/busy list browsing see all items io-wq: ensure we have a stable view of ->cur_work for cancellations io_wq: add get/put_work handlers to io_wq_create() io_uring: check for validity of ->rings in teardown io_uring: fix potential deadlock in io_poll_wake() io_uring: use correct "is IO worker" helper io_uring: fix -ENOENT issue with linked timer with short timeout io_uring: don't do flush cancel under inflight_lock io_uring: flag SQPOLL busy condition to userspace io_uring: make ASYNC_CANCEL work with poll and timeout io_uring: provide fallback request for OOM situations io_uring: convert accept4() -ERESTARTSYS into -EINTR io_uring: fix error clear of ->file_table in io_sqe_files_register() io_uring: separate the io_free_req and io_free_req_find_next interface io_uring: keep io_put_req only responsible for release and put req ...
2019-11-25vfs: mark pipes and sockets as stream-like file descriptorsLinus Torvalds
In commit 3975b097e577 ("convert stream-like files -> stream_open, even if they use noop_llseek") Kirill used a coccinelle script to change "nonseekable_open()" to "stream_open()", which changed the trivial cases of stream-like file descriptors to the new model with FMODE_STREAM. However, the two big cases - sockets and pipes - don't actually have that trivial pattern at all, and were thus never converted to FMODE_STREAM even though it makes lots of sense to do so. That's particularly true when looking forward to the next change: getting rid of FMODE_ATOMIC_POS entirely, and just using FMODE_STREAM to decide whether f_pos updates are needed or not. And if they are, we'll always do them atomically. This came up because KCSAN (correctly) noted that the non-locked f_pos updates are data races: they are clearly benign for the case where we don't care, but it would be good to just not have that issue exist at all. Note that the reason we used FMODE_ATOMIC_POS originally is that only doing it for the minimal required case is "safer" in that it's possible that the f_pos locking can cause unnecessary serialization across the whole write() call. And in the worst case, that kind of serialization can cause deadlock issues: think writers that need readers to empty the state using the same file descriptor. [ Note that the locking is per-file descriptor - because it protects "f_pos", which is obviously per-file descriptor - so it only affects cases where you literally use the same file descriptor to both read and write. So a regular pipe that has separate reading and writing file descriptors doesn't really have this situation even though it's the obvious case of "reader empties what a bit writer concurrently fills" But we want to make pipes as being stream-line anyway, because we don't want the unnecessary overhead of locking, and because a named pipe can be (ab-)used by reading and writing to the same file descriptor. ] There are likely a lot of other cases that might want FMODE_STREAM, and looking for ".llseek = no_llseek" users and other cases that don't have an lseek file operation at all and making them use "stream_open()" might be a good idea. But pipes and sockets are likely to be the two main cases. Cc: Kirill Smelkov <kirr@nexedi.com> Cc: Eic Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Marco Elver <elver@google.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Paul McKenney <paulmck@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-15y2038: socket: use __kernel_old_timespec instead of timespecArnd Bergmann
The 'timespec' type definition and helpers like ktime_to_timespec() or timespec64_to_timespec() should no longer be used in the kernel so we can remove them and avoid introducing y2038 issues in new code. Change the socket code that needs to pass a timespec to user space for backward compatibility to use __kernel_old_timespec instead. This type has the same layout but with a clearer defined name. Slightly reformat tcp_recv_timestamp() for consistency after the removal of timespec64_to_timespec(). Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>