Age | Commit message (Collapse) | Author |
|
connection metadata)""
This reverts commit 80b9e0d4a216cabc94399cc0f7dbb3064fd350ff.
Change-Id: I72edcdc79d93b5b888deb5c210b537cef52fff11
|
|
This reverts commit d07e0df0ca20a1df62100485117df3d90a1a2a80.
Change-Id: I895dad38c5f4c850b7d986e39d61a70ecdea20bf
|
|
Added to satisfy efl/launchpad developers' request.
Tizen code routinely performs the following operation sequence:
1. create kdbus connection
2. update seclabel
3. rely on updated seclabel
CONN_INFO has always returned seclabel collected at HELLO time (behavior
consistent across all kdbus versions and documented in kdbus man). This
would break step 3 of the above sequence.
KDBUS_CMD_UPDATE_METADATA ioctl updates a connection's metadata to
reflect the current state. Metadata is collected in the exact same way
as during HELLO.
Semantics Required by efl/launchpad can be obtained by altering the
sequence like so:
1. create kdbus connection
2. update seclabel
+2b. ioctl(connection_fd, KDBUS_CMD_UPDATE_METADATA);
3. rely on updated seclabel
Change-Id: I4a4b2aea4256f8bfb3bd1c0d3df5e963d243cb52
Signed-off-by: Konrad Lipinski <konrad.l@samsung.com>
|
|
As a consequence, timeouts are now issued in fifo order.
Change-Id: Iacf4e922a37038fa42f95a9d058eeacd0ebce3ec
Signed-off-by: Konrad Lipinski <konrad.l@samsung.com>
|
|
In sync call situation, kdbus_conn_reply() calls kdbus_reply_unlink(reply).
Then, kdbus_conn_entry_sync_attach() also calls kdbus_reply_unlink(reply_wake),
so it is called twice. Even it has no problems because kdbus_reply_unlink()
does't free twice, but it is unnecessary.
Signed-off-by: INSUN PYO <insun.pyo@samsung.com>
Change-Id: Ia2b3552ee2e6a49ff97bb9d8f6e62964fe6d2cbf
|
|
[ This commit re-applies change Ia925fa43bf4c50ab707be98f275b50808782f063 to new kdbus upstream version. ]
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Change-Id: Ia03913a35dba34567c1bd99afbfe138ea7a5d133
|
|
[ This commit re-applies change I34427f381957dd8d366e9e4d837d5a2a34a39cc1 to new kdbus upstream version. ]
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Change-Id: I6155f3599e59cd880267d9c03b0212b47ec9a64c
|
|
[ This commit re-applies change Ifafec44da924ec8ed677629606c92a45e7171636 to new kdbus upstream version. ]
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Change-Id: Ib293d0d864a3a91cf8422bcccc1f7593285868a1
|
|
This commit upgrades kdbus ipc driver from v4 patchset, as posted on lkml
for review by Greg Kroah-Hartman on Mar 09 2015 to commit 0c05fbdc82f from
new upstream kdbus repository (git://github.com/systemd/kdbus).
Summary of major changes:
* message importer rewritten - considerably reduces internal message
processing overhead,
* name registration reworked to follow DBus Specification precisely,
* attached metadata now follow /proc access checks
* reduced in-kernel stack buffer to 256 bytes for small messages
Change-Id: I6d849173b4289e1b684ed1a9b48e6e0b361e5d53
|
|
Change-Id: Ia925fa43bf4c50ab707be98f275b50808782f063
|
|
The dbus1 spec does not place a restriction on who can receive broadcasts.
As long as the sender has a MATCH-rule on itself, it can as well receive
its own broadcasts.
As it turns out, user-space currently relies on this feature. So make sure
to allow this just like dbus1. If we find some client that does not work
with this, we will have to turn it into a HELLO-flag. Until then, just try
to adjust the default behavior.
Signed-off-by: INSUN PYO <insun.pyo@samsung.com>
Change-Id: I83702b59039062967ec2875e268a17d647902a87
|
|
Change-Id: I34427f381957dd8d366e9e4d837d5a2a34a39cc1
|
|
Change-Id: Ifafec44da924ec8ed677629606c92a45e7171636
|
|
Signed-off-by: INSUN PYO <insun.pyo@samsung.com>
Change-Id: I82e1d634c88f2e86a29e7f5d50ce5226943c2e54
|
|
This commit removes support for kdbus-LSM hooks as policy decisions
are handled solely by userspace (libdbuspolicy library). This commit
reverts following:
- 802de9506 ("lsm: smack: smack callbacks for kdbus security hooks")
- f13b7e7bd ("kdbus: use LSM hooks in kdbus code")
- 067afa709 ("lsm: smack: Make ipc/kdbus includes visible so smack callbacks could see them")
- 442f047fd ("lsm: make security_file_receive available for external modules")
- 3b556db4b ("lsm: kdbus security hooks")
Change-Id: Iae90cdb9577a9e706288b28d70bd57574398276e
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
|
|
Possibilities of connections to own, see and talk to well-known names
are already restricted by LSM hooks.
Change-Id: I62d86a506a85e6c48bdd3e0f8b11f1aa5a918c75
Signed-off-by: Lukasz Skalski <l.skalski@samsung.com>
|
|
Change-Id: I5ef09ea4e4389ca41a6ef7afda31fe3a8d9bc507
Signed-off-by: Lukasz Skalski <l.skalski@samsung.com>
|
|
After commit 5d5d56897530 ("make new_sync_{read,write}() static")
->read() cannot be called directly.
kdbus_pool_slice_copy() leads to oops, which can be reproduced by
launching tools/testing/selftests/kdbus/kdbus-test -t message-quota:
[ 1167.146793] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 1167.147554] IP: [< (null)>] (null)
[ 1167.148670] PGD 3a9dd067 PUD 3a841067 PMD 0
[ 1167.149611] Oops: 0010 [#1] SMP
[ 1167.150088] Modules linked in: nfsv3 nfs kdbus lockd grace sunrpc
[ 1167.150771] CPU: 0 PID: 518 Comm: kdbus-test Not tainted 4.0.0-next-20150420-kdbus #62
[ 1167.150771] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1167.150771] task: ffff88003daed120 ti: ffff88003a800000 task.ti: ffff88003a800000
[ 1167.150771] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[ 1167.150771] RSP: 0018:ffff88003a803bc0 EFLAGS: 00010286
[ 1167.150771] RAX: ffff8800377fb000 RBX: 00000000000201e8 RCX: ffff88003a803c00
[ 1167.150771] RDX: 0000000000000b40 RSI: ffff8800377fb4c0 RDI: ffff88003d815700
[ 1167.150771] RBP: ffff88003a803c48 R08: ffffffff8139e380 R09: ffff880039d80490
[ 1167.150771] R10: ffff88003a803a90 R11: 00000000000004c0 R12: 00000000002a24c0
[ 1167.150771] R13: 0000000000000b40 R14: ffff88003d815700 R15: ffffffff8139e460
[ 1167.150771] FS: 00007f41dccd4740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 1167.150771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1167.150771] CR2: 0000000000000000 CR3: 000000003ccdf000 CR4: 00000000000007b0
[ 1167.150771] Stack:
[ 1167.150771] ffffffffa0065497 ffff88003a803c10 00007ffffffff000 ffff88003aaa67c0
[ 1167.150771] 00000000000004c0 ffff88003aaa6870 ffff88003ca83300 ffffffffa006537d
[ 1167.150771] 00000000000201e8 ffffea0000ddfec0 ffff88003a803c20 0000000000000018
[ 1167.150771] Call Trace:
[ 1167.150771] [<ffffffffa0065497>] ? kdbus_pool_slice_copy+0x127/0x200 [kdbus]
[ 1167.150771] [<ffffffffa006537d>] ? kdbus_pool_slice_copy+0xd/0x200 [kdbus]
[ 1167.150771] [<ffffffffa006670a>] kdbus_queue_entry_move+0xaa/0x180 [kdbus]
[ 1167.150771] [<ffffffffa0059e64>] kdbus_conn_move_messages+0x1e4/0x2c0 [kdbus]
[ 1167.150771] [<ffffffffa006234e>] kdbus_name_acquire+0x31e/0x390 [kdbus]
[ 1167.150771] [<ffffffffa00625c5>] kdbus_cmd_name_acquire+0x125/0x130 [kdbus]
[ 1167.150771] [<ffffffffa005db5d>] kdbus_handle_ioctl+0x4ed/0x610 [kdbus]
[ 1167.150771] [<ffffffff811040e0>] do_vfs_ioctl+0x2e0/0x4e0
[ 1167.150771] [<ffffffff81389750>] ? preempt_schedule_common+0x1f/0x3f
[ 1167.150771] [<ffffffff8110431c>] SyS_ioctl+0x3c/0x80
[ 1167.150771] [<ffffffff8138c36e>] system_call_fastpath+0x12/0x71
[ 1167.150771] Code: Bad RIP value.
[ 1167.150771] RIP [< (null)>] (null)
[ 1167.150771] RSP <ffff88003a803bc0>
[ 1167.150771] CR2: 0000000000000000
[ 1167.168756] ---[ end trace a676bcfa75db5a96 ]---
Use __vfs_read() instead.
Signed-off-by: Sergei Zviagintsev <sergei@s15v.net>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Possibilities of connections to own, see and talk to well-known names
are already restricted by LSM hooks.
Signed-off-by: Lukasz Skalski <l.skalski@samsung.com>
|
|
metadata.h references struct kdbus_pids which is defined in
uapi/linux/kdbus.h
Normally, kdbus/metadata.h is included as one of many other headers
that eventually include uapi/linux/kdbus.h at some point.
When included alone, it causes warning. Also when kdbus/connection.h
is included alone (e.g. in smack_lsm.c) , the same warning is shown
as it includes kdbus/metadata.h.
This patch adds missing inclusion.
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
Originates from:
https://github.com/lmctl/kdbus.git (branch: kdbus-lsm-v4.for-systemd-v212)
commit: aa0885489d19be92fa41c6f0a71df28763228a40
Signed-off-by: Karol Lewandowski <k.lewandowsk@samsung.com>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
As it was discussed on systemd ML [1], the same connection should be
queued up only once for a given well-known name.
[1] http://lists.freedesktop.org/archives/systemd-devel/2015-April/030494.html
This commit fixes following issue:
[ 243.364270] ------------[ cut here ]------------
[ 243.364352] WARNING: CPU: 1 PID: 223 at ../ipc/kdbus/names.c:137 kdbus_name_entry_replace_owner+0x88/0x8c()
[ 243.364408] Modules linked in:
[ 243.364474] CPU: 1 PID: 223 Comm: kdbus-test Not tainted 4.0.0+ #1
[ 243.364526] Hardware name: Foundation-v8A (DT)
[ 243.364569] Call trace:
[ 243.364639] [<ffff800000089d38>] dump_backtrace+0x0/0x12c
[ 243.364718] [<ffff800000089e74>] show_stack+0x10/0x1c
[ 243.364798] [<ffff8000006642f4>] dump_stack+0x74/0x98
[ 243.364874] [<ffff8000000b282c>] warn_slowpath_common+0x98/0xd0
[ 243.364951] [<ffff8000000b2928>] warn_slowpath_null+0x14/0x20
[ 243.365026] [<ffff8000003cf7a4>] kdbus_name_entry_replace_owner+0x84/0x8c
[ 243.365105] [<ffff8000003cf7e0>] kdbus_name_release_unlocked.isra.5+0x34/0x170
[ 243.365183] [<ffff8000003d0554>] kdbus_cmd_name_release+0x1b8/0x1c8
[ 243.365270] [<ffff8000003cbd28>] kdbus_handle_ioctl+0x5e0/0x690
[ 243.365347] [<ffff8000001b3520>] do_vfs_ioctl+0x31c/0x5c0
[ 243.365423] [<ffff8000001b3844>] SyS_ioctl+0x80/0x98
[ 243.365473] ---[ end trace 5bf3630c98408d38 ]---
Signed-off-by: Lukasz Skalski <l.skalski@samsung.com>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
I did a routine check for new users of 'timespec', which we are trying to remove
from the kernel in order to survive y2038. kdbus came up and looks particularly
trivial to clean up.
This changes the three ktime_get_ts() variants used in kdbus to ktime_get_ns(),
which aside from removing timespec also simplifies the code and makes it
slightly more efficient by avoiding a two-way conversion.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
The patch 5fc8dd5c84fc: "kdbus: add connection, queue handling and
message validation code" from Sep 11, 2014, leads to the following
static checker warning:
ipc/kdbus/connection.c:2000 kdbus_cmd_send()
warn: 'cancel_fd' isn't an ERR_PTR
Fix this by checking for NULL pointers returned from fget().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
This patch hooks up the build system to actually compile the files
added by previous patches. It also adds an entry to MAINTAINERS to
direct people to Greg KH, David Herrmann, Djalal Harouni and me for
questions and patches.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
This patch adds the policy database implementation.
A policy database restricts the possibilities of connections to own,
see and talk to well-known names. It can be associated with a bus
(through a policy holder connection) or a custom endpoint.
By default, buses have an empty policy database that is augmented on
demand when a policy holder connection is instantiated.
Policies are set through KDBUS_CMD_HELLO (when creating a policy
holder connection), KDBUS_CMD_CONN_UPDATE (when updating a policy
holder connection), KDBUS_CMD_EP_MAKE (creating a custom endpoint)
or KDBUS_CMD_EP_UPDATE (updating a custom endpoint). In all cases,
the name and policy access information is stored in items of type
KDBUS_ITEM_NAME and KDBUS_ITEM_POLICY_ACCESS.
See kdbus.policy(7) for more details.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
This patch adds the name registry implementation.
Each bus instantiates a name registry to resolve well-known names
into unique connection IDs for message delivery. The registry will
be queried when a message is sent with kdbus_msg.dst_id set to
KDBUS_DST_ID_NAME, or when a registry dump is requested.
It's important to have this registry implemented in the kernel to
implement lookups and take-overs in a race-free way.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
Add the logic to handle the following entities:
Domain:
A domain is an unamed object containing a number of buses. A
domain is automatically created when an instance of kdbusfs
is mounted, and destroyed when it is unmounted.
Every domain offers its own 'control' device node to create
buses. Domains are isolated from each other.
Bus:
A bus is a named object inside a domain. Clients exchange messages
over a bus. Multiple buses themselves have no connection to each
other; messages can only be exchanged on the same bus. The default
entry point to a bus, where clients establish the connection to, is
the "bus" device node /sys/fs/kdbus/<bus name>/bus. Common operating
system setups create one "system bus" per system, and one "user
bus" for every logged-in user. Applications or services may create
their own private named buses.
Endpoint:
An endpoint provides the device node to talk to a bus. Opening an
endpoint creates a new connection to the bus to which the endpoint
belongs. Every bus has a default endpoint called "bus". A bus can
optionally offer additional endpoints with custom names to provide
a restricted access to the same bus. Custom endpoints carry
additional policy which can be used to give sandboxed processes
only a locked-down, limited, filtered access to the same bus.
See kdbus(7), kdbus.bus(7), kdbus.endpoint(7) and kdbus.fs(7)
for more details.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
This patch adds code for matches and notifications.
Notifications are broadcast messages generated by the kernel, which
notify subscribes when connections are created or destroyed, when
well-known-names have been claimed, released or changed ownership,
or when reply messages have timed out.
Matches are used to tell the kernel driver which broadcast messages
a connection is interested in. Matches can either be specific on one
of the kernel-generated notification types, or carry a bloom filter
mask to match against a message from userspace. The latter is a way
to pre-filter messages from other connections in order to mitigate
unnecessary wakeups.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
A connection chooses which metadata it wants to have attached to each
message it receives with kdbus_cmd_hello.attach_flags. The metadata
will be attached as items to the messages. All metadata refers to
information about the sending task at sending time, unless otherwise
stated. Also, the metadata is copied, not referenced, so even if the
sending task doesn't exist anymore at the time the message is received,
the information is still preserved.
In traditional D-Bus, userspace tasks like polkit or journald make a
live lookup in procfs and sysfs to gain information about a sending
task. This is racy, of course, as in a a connection-less system like
D-Bus, the originating peer can go away immediately after sending the
message. As we're moving D-Bus prmitives into the kernel, we have to
provide the same semantics here, and inform the receiving peer on the
live credentials of the sending peer.
Metadata is collected at the following times.
* When a bus is created (KDBUS_CMD_MAKE), information about the
calling task is collected. This data is returned by the kernel
via the KDBUS_CMD_BUS_CREATOR_INFO call.
* When a connection is created (KDBUS_CMD_HELLO), information about
the calling task is collected. Alternatively, a privileged
connection may provide 'faked' information about credentials,
PIDs and security labels which will be stored instead. This data
is returned by the kernel as information on a connection
(KDBUS_CMD_CONN_INFO). Only metadata that a connection allowed to
be sent (by setting its bit in attach_flags_send) will be exported
in this way.
* When a message is sent (KDBUS_CMD_SEND), information about the
sending task and the sending connection are collected. This
metadata will be attached to the message when it arrives in the
receiver's pool. If the connection sending the message installed
faked credentials (see kdbus.connection(7)), the message will not
be augmented by any information about the currently sending task.
Which metadata items are actually delivered depends on the following
sets and masks:
(a) the system-wide kmod creds mask
(module parameter 'attach_flags_mask')
(b) the per-connection send creds mask, set by the connecting client
(c) the per-connection receive creds mask, set by the connecting client
(d) the per-bus minimal creds mask, set by the bus creator
(e) the per-bus owner creds mask, set by the bus creator
(f) the mask specified when querying creds of a bus peer
(g) the mask specified when querying creds of a bus owner
With the following rules:
[1] The creds attached to messages are determined as a & b & c.
[2] When connecting to a bus (KDBUS_CMD_HELLO), and ~b & d != 0,
the call will fail with, -1, and errno is set to ECONNREFUSED.
[3] When querying creds of a bus peer, the creds returned
are a & b & f.
[4] When querying creds of a bus owner, the creds returned
are a & e & g.
See kdbus.metadata(7) and kdbus.item(7) for more details on which
metadata can currently be attached to messages.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
kdbusfs is a filesystem that will expose a fresh kdbus domain context
each time it is mounted. Per mount point, there will be a 'control'
node, which can be used to create buses. fs.c contains the
implementation of that pseudo-fs. Exported inodes of 'file' type have
their i_fop set to either kdbus_handle_control_ops or
kdbus_handle_ep_ops, depending on their type. The actual dispatching
of file operations is done from handle.c
node.c is an implementation of a kdbus object that has an id and
children, organized in an R/B tree. The tree is used by the filesystem
code for lookup and iterator functions, and to deactivate children
once the parent is deactivated. Every inode exported by kdbusfs is
backed by a kdbus_node, hence it is embedded in struct kdbus_ep,
struct kdbus_bus and struct kdbus_domain.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
This patch adds code to create and destroy connections, to validate
incoming messages and to maintain the queue of messages that are
associated with a connection.
Note that connection and queue have a 1:1 relation, the code is only
split in two parts for cleaner separation and better readability.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
A pool for data received from the kernel is installed for every
connection of the bus, and it is used to copy data from the kernel to
userspace clients, for messages and other information.
It is accessed when one of the following ioctls is issued:
* KDBUS_CMD_MSG_RECV, to receive a message
* KDBUS_CMD_NAME_LIST, to dump the name registry
* KDBUS_CMD_CONN_INFO, to retrieve information on a connection
The offsets returned by either one of the aforementioned ioctls
describe offsets inside the pool. Internally, the pool is organized in
slices, that are dynamically allocated on demand. The overall size of
the pool is chosen by the connection when it connects to the bus with
KDBUS_CMD_HELLO.
In order to make the slice available for subsequent calls,
KDBUS_CMD_FREE has to be called on the offset.
To access the memory, the caller is expected to mmap() it to its task.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
Add the basic driver structure.
handle.c is the main ioctl command dispatcher that calls into other parts
of the driver.
main.c contains the code that creates the initial domain at startup, and
util.c has utility functions such as item iterators that are shared with
other files.
limits.h describes limits on things like maximum data structure sizes,
number of messages per users and suchlike. Some of the numbers currently
picked are rough ideas of what what might be sufficient and are probably
rather conservative.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Osmialowski <p.osmialowsk@samsung.com>
|
|
[ Upstream commit 9b24fef9f0410fb5364245d6cc2bd044cc064007 ]
Commit 53dad6d3a8e5 ("ipc: fix race with LSMs") updated ipc_rcu_putref()
to receive rcu freeing function but used generic ipc_rcu_free() instead
of msg_rcu_free() which does security cleaning.
Running LTP msgsnd06 with kmemleak gives the following:
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88003c0a11f8 (size 8):
comm "msgsnd06", pid 1645, jiffies 4294672526 (age 6.549s)
hex dump (first 8 bytes):
1b 00 00 00 01 00 00 00 ........
backtrace:
kmemleak_alloc+0x23/0x40
kmem_cache_alloc_trace+0xe1/0x180
selinux_msg_queue_alloc_security+0x3f/0xd0
security_msg_queue_alloc+0x2e/0x40
newque+0x4e/0x150
ipcget+0x159/0x1b0
SyS_msgget+0x39/0x40
entry_SYSCALL_64_fastpath+0x13/0x8f
Manfred Spraul suggested to fix sem.c as well and Davidlohr Bueso to
only use ipc_rcu_free in case of security allocation failure in newary()
Fixes: 53dad6d3a8e ("ipc: fix race with LSMs")
Link: http://lkml.kernel.org/r/1470083552-22966-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
|
|
[ Upstream commit 15db15e2f10ae12d021c9a2e9edd8a03b9238551 ]
commit 1ac0b6dec656f3f78d1c3dd216fad84cb4d0a01e upstream.
remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped. It breaks expectations of
IPC subsystem.
Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):
#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/shm.h>
#define PAGE_SIZE 4096
int main()
{
int id;
void *p;
id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
p = shmat(id, NULL, 0);
shmctl(id, IPC_RMID, NULL);
remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
return 0;
}
The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().
[1] http://github.com/google/syzkaller
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit d0edd8528362c07216498340e928159510595e7b ]
Considering Linus' past rants about the (ab)use of BUG in the kernel, I
took a look at how we deal with such calls in ipc. Given that any errors
or corruption in ipc code are most likely contained within the set of
processes participating in the broken mechanisms, there aren't really many
strong fatal system failure scenarios that would require a BUG call.
Also, if something is seriously wrong, ipc might not be the place for such
a BUG either.
1. For example, recently, a customer hit one of these BUG_ONs in shm
after failing shm_lock(). A busted ID imho does not merit a BUG_ON,
and WARN would have been better.
2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
I don't see how we can hit this anyway -- at least it should be IS_ERR.
The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
first and foremost. We could also probably drop this check altogether.
Either way, it does not merit a BUG_ON.
3. No ->fault() callback for the fs getting the corresponding page --
seems selfish to make the system unusable.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
[ Upstream commit c5c8975b2eb4eb7604e8ce4f762987f56d2a96a2 ]
Upon every shm_lock call, we BUG_ON if an error was returned, indicating
racing either in idr or in shm_destroy. Move this logic into the locking.
[akpm@linux-foundation.org: simplify code]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
|
|
commit b9a532277938798b53178d5a66af6e2915cb27cf upstream.
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state. Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.
We already did this for the IPC semaphore code (see commit e8577d1f0329:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3ed1f8a99d70ea1cd1508910eb107d0edcae5009 upstream.
sem_lock() did not properly pair memory barriers:
!spin_is_locked() and spin_unlock_wait() are both only control barriers.
The code needs an acquire barrier, otherwise the cpu might perform read
operations before the lock test.
As no primitive exists inside <include/spinlock.h> and since it seems
noone wants another primitive, the code creates a local primitive within
ipc/sem.c.
With regards to -stable:
The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
nop (just a preprocessor redefinition to improve the readability). The
bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
starting from 3.10).
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Kirill Tkhai <ktkhai@parallels.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
set exits
commit 602b8593d2b4138c10e922eeaafe306f6b51817b upstream.
The current semaphore code allows a potential use after free: in
exit_sem we may free the task's sem_undo_list while there is still
another task looping through the same semaphore set and cleaning the
sem_undo list at freeary function (the task called IPC_RMID for the same
semaphore set).
For example, with a test program [1] running which keeps forking a lot
of processes (which then do a semop call with SEM_UNDO flag), and with
the parent right after removing the semaphore set with IPC_RMID, and a
kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
in the kernel log:
Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b kkkkkkkk.kkkkkkk
010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........
Prev obj: start=ffff88003b45c180, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff ...........7....
Next obj: start=ffff88003b45c200, len=64
000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a .....N......ZZZZ
010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff ........h).<....
BUG: spinlock wrong CPU on CPU#2, test/18028
general protection fault: 0000 [#1] SMP
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: spin_dump+0x53/0xc0
Call Trace:
spin_bug+0x30/0x40
do_raw_spin_unlock+0x71/0xa0
_raw_spin_unlock+0xe/0x10
freeary+0x82/0x2a0
? _raw_spin_lock+0xe/0x10
semctl_down.clone.0+0xce/0x160
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
SyS_semctl+0x236/0x2c0
? syscall_trace_leave+0xde/0x130
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
RIP [<ffffffff810d6053>] spin_dump+0x53/0xc0
RSP <ffff88003750fd68>
---[ end trace 783ebb76612867a0 ]---
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
CPU: 3 PID: 18053 Comm: test Tainted: G D 4.2.0-rc5+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
RIP: native_read_tsc+0x0/0x20
Call Trace:
? delay_tsc+0x40/0x70
__delay+0xf/0x20
do_raw_spin_lock+0x96/0x140
_raw_spin_lock+0xe/0x10
sem_lock_and_putref+0x11/0x70
SYSC_semtimedop+0x7bf/0x960
? handle_mm_fault+0xbf6/0x1880
? dequeue_task_fair+0x79/0x4a0
? __do_page_fault+0x19a/0x430
? kfree_debugcheck+0x16/0x40
? __do_page_fault+0x19a/0x430
? __audit_syscall_entry+0xa8/0x100
? do_audit_syscall_entry+0x66/0x70
? syscall_trace_enter_phase1+0x139/0x160
SyS_semtimedop+0xe/0x10
SyS_semop+0x10/0x20
entry_SYSCALL_64_fastpath+0x12/0x71
Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
Kernel panic - not syncing: softlockup: hung tasks
I wasn't able to trigger any badness on a recent kernel without the
proper config debugs enabled, however I have softlockup reports on some
kernel versions, in the semaphore code, which are similar as above (the
scenario is seen on some servers running IBM DB2 which uses semaphore
syscalls).
The patch here fixes the race against freeary, by acquiring or waiting
on the sem_undo_list lock as necessary (exit_sem can race with freeary,
while freeary sets un->semid to -1 and removes the same sem_undo from
list_proc or when it removes the last sem_undo).
After the patch I'm unable to reproduce the problem using the test case
[1].
[1] Test case used below:
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <errno.h>
#define NSEM 1
#define NSET 5
int sid[NSET];
void thread()
{
struct sembuf op;
int s;
uid_t pid = getuid();
s = rand() % NSET;
op.sem_num = pid % NSEM;
op.sem_op = 1;
op.sem_flg = SEM_UNDO;
semop(sid[s], &op, 1);
exit(EXIT_SUCCESS);
}
void create_set()
{
int i, j;
pid_t p;
union {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
} un;
/* Create and initialize semaphore set */
for (i = 0; i < NSET; i++) {
sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
if (sid[i] < 0) {
perror("semget");
exit(EXIT_FAILURE);
}
}
un.val = 0;
for (i = 0; i < NSET; i++) {
for (j = 0; j < NSEM; j++) {
if (semctl(sid[i], j, SETVAL, un) < 0)
perror("semctl");
}
}
/* Launch threads that operate on semaphore set */
for (i = 0; i < NSEM * NSET * NSET; i++) {
p = fork();
if (p < 0)
perror("fork");
if (p == 0)
thread();
}
/* Free semaphore set */
for (i = 0; i < NSET; i++) {
if (semctl(sid[i], NSEM, IPC_RMID))
perror("IPC_RMID");
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
int main(int argc, char **argv)
{
pid_t p;
srand(time(NULL));
while (1) {
p = fork();
if (p < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (p == 0) {
create_set();
goto end;
}
/* Wait for forked processes to exit */
while (wait(NULL)) {
if (errno == ECHILD)
break;
};
}
end:
return 0;
}
[akpm@linux-foundation.org: use normal comment layout]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
CC: Aristeu Rozanski <aris@redhat.com>
Cc: David Jeffery <djeffery@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
account
commit de54b9ac253787c366bbfb28d901a31954eb3511 upstream.
A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859b36d ("ipc/mqueue: improve performance of send/recv").
That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue. Before, this field
reflected the size of the user-data in the queue. Since, it also takes
kernel data structures into account. For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.
There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):
The pseudofiles in the mqueue filesystem (usually mounted at
/dev/mqueue) expose fields with metadata describing a message
queue. One of these fields, QSIZE, as originally implemented,
showed the total number of bytes of user data in all messages in
the message queue, and this feature was documented from the
beginning in the mq_overview(7) page. In 3.5, some other (useful)
work happened to break the user-space API in a couple of places,
including the value exposed via QSIZE, which now includes a measure
of kernel overhead bytes for the queue, a figure that renders QSIZE
useless for its original purpose, since there's no way to deduce
the number of overhead bytes consumed by the implementation.
(The other user-space breakage was subsequently fixed.)
This patch removes the accounting of kernel data structures in the
queue. Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above). Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.
It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation). Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.
Signed-off-by: Marcus Gelderie <redmnic@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: John Duffy <jb_duffy@btinternet.com>
Cc: Arto Bendiken <arto@bendiken.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
|
|
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.
See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Call __set_current_state() instead of assigning the new state directly.
These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping
track of who changed the state.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).
The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns
|
|
Andrew Morton noted
http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org
that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy(). This converts
both users over to use i_size_read().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is a highly-contrived scenario. But, a single shmdt() call can be
induced in to unmapping memory from mulitple shm segments. Example code
is here:
http://www.sr71.net/~dave/intel/shmfun.c
The fix is pretty simple: Record the 'struct file' for the first VMA we
encounter and then stick to it. Decline to unmap anything not from the
same file and thus the same segment.
I found this by inspection and the odds of anyone hitting this in practice
are pretty darn small.
Lightly tested, but it's a pretty small patch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
SysV can be abused to allocate locked kernel memory. For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.
Therefore: increase MSGMNI to the maximum supported.
And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.
The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.
Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.
Or he can disable sysv entirely (as e.g. done by Android).
2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|