summaryrefslogtreecommitdiff
path: root/drivers/nvme
AgeCommit message (Collapse)Author
2023-02-28nvme-fabrics: show well known discovery nameDaniel Wagner
The kernel always logs the unique subsystem name for a discovery controller, even in the case user space asked for the well known. This has lead to confusion as the logs of nvme-cli and the kernel logs didn't match. First, nvme-cli connects to the well known discovery controller to figure out if it supports TP8013. If so then nvme-cli disconnects and connects to the unique discovery controller. Currently, the kernel show that user space connected twice to the unique one. To avoid further confusion, show the well known discovery controller if user space asked for it: $ nvme connect-all -v -t tcp -a 192.168.0.1 nvme0: nqn.2014-08.org.nvmexpress.discovery connected nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected nvme0: nqn.discovery connected kernel log: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009 nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009 Fixes: e5ea42faa773 ("nvme: display correct subsystem NQN") Signed-off-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28nvme-tcp: don't access released socket during error recoveryAkinobu Mita
While the error recovery work is temporarily failing reconnect attempts, running the 'nvme list' command causes a kernel NULL pointer dereference by calling getsockname() with a released socket. During error recovery work, the nvme tcp socket is released and a new one created, so it is not safe to access the socket without proper check. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: 02c57a82c008 ("nvme-tcp: print actual source IP address through sysfs "address" attr") Reviewed-by: Martin Belanger <martin.belanger@dell.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()Dan Carpenter
This function was transitioned from returning NVMe status codes to returning traditional kernel error codes. However, this particular return now accidentally returns positive error codes like ENOMEM instead of negative -ENOMEM. Fixes: b0ef1b11d390 ("nvme-auth: don't use NVMe status codes") Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-28nvme: bring back auto-removal of deleted namespaces during sequential scanChristoph Hellwig
Bring back the check of the Identify Namespace return value for the legacy NVMe 1.0-style sequential scanning. While NVMe 1.0 does not support namespace management, there are "modern" cloud solutions like Google Cloud Platform that claim the obsolete 1.0 compliance for no good reason while supporting proprietary sideband namespace management. Fixes: 1a893c2bfef4 ("nvme: refactor namespace probing") Reported-by: Nils Hanke <nh@edgeless.systems> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Nils Hanke <nh@edgeless.systems>
2023-02-27nvme: fix sparse warning on effects maskingKeith Busch
The log entries are stored in le32, so use appropriate byte swapping macros. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302242222.PevBhzvC-lkp@intel.com/ Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14nvme-pci: remove iod use_sglsKeith Busch
It's not used anywhere anymore, so remove it. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-14nvme-pci: fix freeing single sglKeith Busch
There may only be a single DMA mapped entry from multiple physical segments, which means we don't allocate a separte SGL list. Check the number of allocations prior to know if we need to free something. Freeing a single list allocation is the same for both PRP and SGL usages, so we don't need to check the use_sgl flag anymore. Fixes: 01df742d8c5c0 ("nvme-pci: remove SGL segment descriptors") Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
2023-02-07Merge tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.3/block Pull NVMe updates from Christoph: "nvme updates for Linux 6.3 - small improvements to the logging functionality (Amit Engel) - authentication cleanups (Hannes Reinecke) - cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - work around the command effects for Format NVM (Keith Busch) - misc cleanups (Keith Busch, Christoph Hellwig)" * tag 'nvme-6.3-2023-02-07' of git://git.infradead.org/nvme: nvme: mask CSE effects for security receive nvme: always initialize known command effects nvmet: for nvme admin set_features cmd, call nvmet_check_data_len_lte() nvme-tcp: add additional info for nvme_tcp_timeout log nvme: add nvme_opcode_str function for all nvme cmd types nvme: remove nvme_execute_passthru_rq nvme-pci: place descriptor addresses in iod nvme-pci: use mapped entries for sgl decision nvme-pci: remove SGL segment descriptors nvme-auth: don't use NVMe status codes nvme-fabrics: clarify AUTHREQ result handling
2023-02-03nvme: use bvec_set_virt to initialize special_vecChristoph Hellwig
Use the bvec_set_virt helper to initialize the special_vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03nvmet: use bvec_set_page to initialize bvecsChristoph Hellwig
Use the bvec_set_page helper to initialize bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230203150634.3199647-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-01nvme: mask CSE effects for security receiveKeith Busch
The nvme driver will freeze the IO queues in response to an admin command with CSE bits set. These bits notify the host that the command that's about to be executed needs to be done exclusively, hence the freeze. The Security Receive command is often reported by multiple vendors with CSE bits set. The reason for this is that the result depends on the previous Security Send. This has nothing to do with IO queues, though, so the driver is taking an overly cautious response to seeing this passthrough command, while unable to fufill the intended admin queue action. Rather than freeze IO during this harmless command, mask off the effects. This freezing is observed to cause IO latency spikes when host software periodically validates the security state of the drives. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme: always initialize known command effectsKeith Busch
Instead of appending command effects flags per IO, set the known effects flags the driver needs to react to just once during initial setup. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvmet: for nvme admin set_features cmd, call nvmet_check_data_len_lte()Amit Engel
This is due to the fact that the host is allowed to pass the controller an sgl describing a buffer that is larger than the payload itself Signed-off-by: Amit Engel <Amit.Engel@dell.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-tcp: add additional info for nvme_tcp_timeout logAmit Engel
This provides additional details about the rq/cmd that is timed out example log if CONFIG_NVME_VERBOSE_ERRORS is configured: "nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc Write (0x1)" example log if CONFIG_NVME_VERBOSE_ERRORS is not configured: "nvme nvme0: queue 2 timeout cid 0xd058 type 4 opc I/O Cmd (0x1)" Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme: add nvme_opcode_str function for all nvme cmd typesAmit Engel
nvme_opcode_str will handle io/admin/fabrics ops This improves NVMe errors logging Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme: remove nvme_execute_passthru_rqChristoph Hellwig
After moving the nvme_passthru_end call to the callers of nvme_execute_passthru_rq, this function has become quite pointless, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-02-01nvme-pci: place descriptor addresses in iodKeith Busch
The 'struct nvme_iod' space is appended at the end of the preallocated 'struct request', and padded to the cache line size. This leaves some free memory (in most kernel configs) up for grabs. Instead of appending the nvme data descriptor addresses after the scatterlist, inline these for free within struct nvme_iod. There is now enough space in the mempool for 128 possibe segments. And without increasing the size of the preallocated requests, we can hold up to 5 PRP descriptor elements, allowing the driver to increase its max transfer size to 8MB. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-pci: use mapped entries for sgl decisionKeith Busch
The driver uses the dma entries for setting up its command's SGL/PRP lists. The dma mapping might have fewer entries than the physical segments, so check the dma mapped count to determine which nvme data layout method is more optimal. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-pci: remove SGL segment descriptorsKeith Busch
The max segments this driver can see is 127, well below the 256 threshold needed to add an nvme sgl segment descriptor. Remove all the useless checks and dead code. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-auth: don't use NVMe status codesHannes Reinecke
NVMe status codes are part of the wire protocol, and shouldn't be fabricated in the stack. So with this patch the authentication code is switched over to use error codes; as a side effect authentication failures due to internal error won't be retried anymore. But that shouldn't have happened anyway. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-fabrics: clarify AUTHREQ result handlingHannes Reinecke
The NVMe 2.0 spec defines the ATR and ASCR bits in the AUTHREQ connect response field to be mutually exclusive. So to clarify the handling here switch the AUTHREQ handling to use the bit definitions and check for both bits. And while we're at it, add a message to the user that secure concatenation is not supported (yet). Suggested-by: Mark Lehrer <mark.lehrer@wdc.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-29block: introduce bdev_zone_no helperPankaj Raghav
Add a generic bdev_zone_no() helper to calculate zone number for a given sector in a block device. This helper internally uses disk_zone_no() to find the zone number. Use the helper bdev_zone_no() to calculate nr of zones. This lets us make modifications to the math if needed in one place. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230110143635.77300-4-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29nvme: set REQ_ALLOC_CACHE for uring-passthru requestAnuj Gupta
This patch sets REQ_ALLOC_CACHE flag for uring-passthru requests. This is a prep-patch so that normal / IRQ-driven uring-passthru I/Os can also leverage bio-cache. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20230117120638.72254-2-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-27Merge tag 'block-6.2-2023-01-27' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Minor tweaks for this release: - NVMe pull request via Christoph: - Flush initial scan_work for async probe (Keith Busch) - Fix passthrough csi check (Keith Busch) - Fix nvme-fc initialization order (Ross Lagerwall) - Fix for tearing down non-started device in ublk (Ming)" * tag 'block-6.2-2023-01-27' of git://git.kernel.dk/linux: block: ublk: move ublk_chr_class destroying after devices are removed nvme: fix passthrough csi check nvme-pci: flush initial scan_work for async probe nvme-fc: fix initialization order
2023-01-25nvme: fix passthrough csi checkKeith Busch
The namespace head saves the Command Set Indicator enum, so use that instead of the Command Set Selected. The two values are not the same. Fixes: 831ed60c2aca2d ("nvme: also return I/O command effects from nvme_command_effects") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-24nvme-pci: flush initial scan_work for async probeKeith Busch
The nvme device may have a namespace with the root partition, so make sure we've completed scanning before returning from the async probe. Fixes: eac3ef262941 ("nvme-pci: split the initial probe from the rest path") Reported-by: Klaus Jensen <its@irrelevant.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-23nvme-fc: fix initialization orderRoss Lagerwall
ctrl->ops is used by nvme_alloc_admin_tag_set() but set by nvme_init_ctrl() so reorder the calls to avoid a NULL pointer dereference. Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-20Merge tag 'block-6.2-2023-01-20' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Various little tweaks all over the place: - NVMe pull request via Christoph: - fix controller shutdown regression in nvme-apple (Janne Grunau) - fix a polling on timeout regression in nvme-pci (Keith Busch) - Fix a bug in the read request side request allocation caching (Pavel) - pktcdvd was brought back after we configured a NULL return on bio splits, make it consistent with the others (me) - BFQ refcount fix (Yu) - Block cgroup policy activation fix (Yu) - Fix for an md regression introduced in the 6.2 cycle (Adrian)" * tag 'block-6.2-2023-01-20' of git://git.kernel.dk/linux: nvme-pci: fix timeout request state check nvme-apple: only reset the controller when RTKit is running nvme-apple: reset controller during shutdown block: fix hctx checks for batch allocation block/rnbd-clt: fix wrong max ID in ida_alloc_max blk-cgroup: fix missing pd_online_fn() while activating policy pktcdvd: check for NULL returna fter calling bio_split_to_limits() block, bfq: switch 'bfqg->ref' to use atomic refcount apis md: fix incorrect declaration about claim_rdev in md_import_device
2023-01-19nvme-pci: fix timeout request state checkKeith Busch
Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT. This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides. Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-19nvme-apple: only reset the controller when RTKit is runningJanne Grunau
NVMe controller register access hangs indefinitely when the co-processor is not running. A missed reset is preferable over a hanging thread since it could be recoverable. Signed-off-by: Janne Grunau <j@jannau.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-19nvme-apple: reset controller during shutdownJanne Grunau
This is a functional revert of c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable"). The commit broke suspend/resume since apple_nvme_reset_work() tries to disable the controller on resume. This does not work for the apple NVMe controller since register access only works while the co-processor firmware is running. Disabling the NVMe controller in the shutdown path is also required for shutting the co-processor down. The original code was appropriate for this hardware. Add a comment to prevent a similar breaking changes in the future. Fixes: c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable") Reported-by: Janne Grunau <j@jannau.net> Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/ Signed-off-by: Janne Grunau <j@jannau.net> [hch: updated with a more descriptive comment from Hector Martin] Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-13Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Nothing major in here, just a collection of NVMe fixes and dropping a wrong might_sleep() that static checkers tripped over but which isn't valid" * tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux: MAINTAINERS: stop nvme matching for nvmem files nvme: don't allow unprivileged passthrough on partitions nvme: replace the "bool vec" arguments with flags in the ioctl path nvme: remove __nvme_ioctl nvme-pci: fix error handling in nvme_pci_enable() nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression block: Drop spurious might_sleep() from blk_put_queue()
2023-01-10nvme: don't allow unprivileged passthrough on partitionsChristoph Hellwig
Passthrough commands can always access the entire device, and thus submitting them on partitions is an privelege escalation. In hindsight we should have never allowed any passthrough commands on partitions, but it's probably too late to change that decision now. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme: replace the "bool vec" arguments with flags in the ioctl pathChristoph Hellwig
To prepare for passing down more information, replace the boolean vec argument with a more extensible flags one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme: remove __nvme_ioctlChristoph Hellwig
Open code __nvme_ioctl in the two callers to make future changes that pass down additional paramters in the ioctl path easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2023-01-10nvme-pci: fix error handling in nvme_pci_enable()Tong Zhang
There are two issues in nvme_pci_enable(): 1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable statement. 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case, we will need to free IRQ properly. Otherwise the following warning could be triggered: [ 5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140 [ 5.290547] Call Trace: [ 5.290626] <TASK> [ 5.290695] msi_remove_device_irq_domain+0xc9/0xf0 [ 5.290843] msi_device_data_release+0x15/0x80 [ 5.290978] release_nodes+0x58/0x90 [ 5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80 [ 5.297573] Call Trace: [ 5.297651] <TASK> [ 5.297719] release_nodes+0x58/0x90 [ 5.297831] devres_release_all+0xef/0x140 [ 5.298339] device_unbind_cleanup+0x11/0xc0 [ 5.298479] really_probe+0x296/0x320 Fixes: a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllersHector Martin
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List command and behave identically to the SoC version judging by existing user reports/syslogs, so will need the same fix. This quirk reverts back to NVMe 1.0 behavior and disables the broken commands. Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Signed-off-by: Hector Martin <marcan@marcan.st> Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-10nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regressionHector Martin
From the get-go, this driver and the ANS syslog have been complaining about namespace identification. In 6.2-rc1, commit 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") regressed the driver by no longer allowing fallback to sequential namespace scans, leaving us with no namespaces. It turns out that the real problem is that this controller claiming NVMe 1.1 compat is treating the CNS field as a binary field, as in NVMe 1.0. This already has a quirk, NVME_QUIRK_IDENTIFY_CNS, so set it for the controller to fix all this nonsense (including other errors triggered by other CNS commands). Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Fixes: 5bd2927aceba ("nvme-apple: Add initial Apple SoC NVMe driver") Signed-off-by: Hector Martin <marcan@marcan.st> Reviewed-by: Sven Peter <sven@svenpeter.dev> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-01-06Merge tag 'block-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "The big change here is obviously the revert of the pktcdvd driver removal. Outside of that, just minor tweaks. In detail: - Re-instate the pktcdvd driver, which necessitates adding back bio_copy_data_iter() and the fops->devnode() hook for now (me) - Fix for splitting of a bio marked as NOWAIT, causing either nowait reads or writes to error with EAGAIN even if parts of the IO completed (me) - Fix for ublk, punting management commands to io-wq as they can all easily block for extended periods of time (Ming) - Removal of SRCU dependency for the block layer (Paul)" * tag 'block-2023-01-06' of git://git.kernel.dk/linux: block: Remove "select SRCU" Revert "pktcdvd: remove driver." Revert "block: remove devnode callback from struct block_device_operations" Revert "block: bio_copy_data_iter" ublk: honor IO_URING_F_NONBLOCK for handling control command block: don't allow splitting of a REQ_NOWAIT bio block: handle bio_split_to_limits() NULL return
2023-01-04block: handle bio_split_to_limits() NULL returnJens Axboe
This can't happen right now, but in preparation for allowing bio_split_to_limits() returning NULL if it ended the bio, check for it in all the callers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-29Merge tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: "Mostly just NVMe, but also a single fixup for BFQ for a regression that happened during the merge window. In detail: - NVMe pull requests via Christoph: - Fix doorbell buffer value endianness (Klaus Jensen) - Fix Linux vs NVMe page size mismatch (Keith Busch) - Fix a potential use memory access beyong the allocation limit (Keith Busch) - Fix a multipath vs blktrace NULL pointer dereference (Yanjun Zhang) - Fix various problems in handling the Command Supported and Effects log (Christoph Hellwig) - Don't allow unprivileged passthrough of commands that don't transfer data but modify logical block content (Christoph Hellwig) - Add a features and quirks policy document (Christoph Hellwig) - Fix some really nasty code that was correct but made smatch complain (Sagi Grimberg) - Use-after-free regression in BFQ from this merge window (Yu)" * tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux: nvme-auth: fix smatch warning complaints nvme: consult the CSE log page for unprivileged passthrough nvme: also return I/O command effects from nvme_command_effects nvmet: don't defer passthrough commands with trivial effects to the workqueue nvmet: set the LBCC bit for commands that modify data nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition docs, nvme: add a feature and quirk policy document nvme-pci: update sqsize when adjusting the queue depth nvme: fix setting the queue depth in nvme_alloc_io_tag_set block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq nvme: fix multipath crash caused by flush request when blktrace is enabled nvme-pci: fix page size checks nvme-pci: fix mempool alloc size nvme-pci: fix doorbell buffer value endianness
2022-12-28nvme-auth: fix smatch warning complaintsSagi Grimberg
When initializing auth context, there may be no secrets passed by the user. Make return code explicit when returning successfully. smatch warnings: drivers/nvme/host/auth.c:950 nvme_auth_init_ctrl() warn: missing error code? 'ret' Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-12-28nvme: consult the CSE log page for unprivileged passthroughChristoph Hellwig
Commands like Write Zeros can change the contents of a namespaces without actually transferring data. To protect against this, check the Commands Supported and Effects log is supported by the controller for any unprivileg command passthrough and refuse unprivileged passthrough if the command has any effects that can change data or metadata. Note: While the Commands Support and Effects log page has only been mandatory since NVMe 2.0, it is widely supported because Windows requires it for any command passthrough from userspace. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvme: also return I/O command effects from nvme_command_effectsChristoph Hellwig
To be able to use the Commands Supported and Effects Log for allowing unprivileged passtrough, it needs to be corretly reported for I/O commands as well. Return the I/O command effects from nvme_command_effects, and also add a default list of effects for the NVM command set. For other command sets, the Commands Supported and Effects log is required to be present already. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvmet: don't defer passthrough commands with trivial effects to the workqueueChristoph Hellwig
Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
2022-12-28nvmet: set the LBCC bit for commands that modify dataChristoph Hellwig
Write, Write Zeroes, Zone append and a Zone Reset through Zone Management Send modify the logical block content of a namespace, so make sure the LBCC bit is reported for them. Fixes: b5d0b38c0475 ("nvmet: add Command Set Identifier support") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-28nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding itChristoph Hellwig
Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a single value to multiple array entries instead of repeated assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-12-26nvme-pci: update sqsize when adjusting the queue depthChristoph Hellwig
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it. Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-26nvme: fix setting the queue depth in nvme_alloc_io_tag_setChristoph Hellwig
While the CAP.MQES field in NVMe is a 0s based filed with a natural one off, we also need to account for the queue wrap condition and fix undo the one off again in nvme_alloc_io_tag_set. This was never properly done by the fabrics drivers, but they don't seem to care because there is no actual physical queue that can wrap around, but it became a problem when converting over the PCIe driver. Also add back the BLK_MQ_MAX_DEPTH check that was lost in the same commit. Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-22Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "Mostly small bug fixes and small updates. The only things of note is a qla2xxx fix for crash on hotplug and timeout and the addition of a user exposed abstraction layer for persistent reservation error return handling (which necessitates the conversion of nvme.c as well as SCSI)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: qla2xxx: Fix crash when I/O abort times out nvme: Convert NVMe errors to PR errors scsi: sd: Convert SCSI errors to PR errors scsi: core: Rename status_byte to sg_status_byte block: Add error codes for common PR failures scsi: sd: sd_zbc: Trace zone append emulation scsi: libfc: Include the correct header